Hi Lorenzo, Thanks for the input. For Krylov substance solvers most of the work is usually in the matrix-vector multiplies (and possibly the preconditioner if you provide your own), so I'm not sure what you mean by scaling in this case. Are you running, e.g., conjugate gradients on a dense Eigen::Matrix (I assume not) or sparse Eigen::SparseMatrix? In the latter case, most of the scaling would be in the sparse matrix-vector product. The vector operations (dot product, norms etc.) in the Krylov solvers are not multi-threaded in Eigen AFAICT.
This type of multi-threading _is_ available in the Tensor library, if you use a ThreadPoolDevice for evaluation, see, for example https://gitlab.com/libeigen/eigen/-/blob/master/unsupported/test/cxx11_tensor_thread_pool.cpp#L50 However, adding this type of general multi-threaded evaluation to matrix expression in Eigen Core by unifying it with Tensor, is a much longer term project that nobody is working AFAICT, although it has been on the wishlist for a long time. Best regard, Rasmus On Mon, Sep 13, 2021 at 11:52 PM Lorenzo Botti <[email protected]> wrote: > Dear all, > I hope that this new implementation also improves the scaling of Eigen > solvers, e.g. conjugate gradient and biconjugate gradient. What I usually > get is a factor 2 speed up independently from the number of threads. > > Thanks for the effort. > Best regards. > Lorenzo > > > > On Mon, Sep 13, 2021, 17:52 Rasmus Munk Larsen <[email protected]> > wrote: > >> Hi Peter, >> >> I would recommend that you hold off on this change for a bit. We are >> planning on moving the non-blocking threadpool from unsupported into core >> in the near future and use that as the basis for multi-threading without >> depending on OpenMP. >> >> Rasmus >> >> On Mon, Sep 13, 2021 at 5:21 AM Peter <[email protected]> wrote: >> >>> Dear all, >>> >>> I'm currently playing with the sparse matrix implementation within eigen, >>> namely Eigen::SparseMatrix<double, Eigen::ColMajor> combined with a self >>> adjoint view. >>> >>> In my application I need the multiplication of a symmetric sparse matrix >>> with a dense matrix, >>> where the dimension of the matrices are of the order of up to a few ten >>> thousands. >>> >>> I tried to parallelize the dot product in SparseSelfAdjointView.h by >>> copying a omp directive >>> from other parts in eigen, and I tried to parallelize the outer loop >>> directly >>> with std::execution::par, see below. >>> >>> In summary, I do not see any effect of the parallelization. >>> Before digging deeper into it and building a minimal working example, >>> has someone already achieved this? >>> >>> Could one actually directly call the corresponding MKL routine or are >>> the internal storage schemes different? >>> >>> Best regards >>> Peter >>> >>> P.S.: That's what I tried as a diff to the current master branch: >>> >>> :~/Eigen/eigen/Eigen/src/SparseCore$ git diff SparseSelfAdjointView.h >>> diff --git a/Eigen/src/SparseCore/SparseSelfAdjointView.h >>> b/Eigen/src/SparseCore/SparseSelfAdjointView.h >>> index 0302ef3a4..91e7d495b 100644 >>> --- a/Eigen/src/SparseCore/SparseSelfAdjointView.h >>> +++ b/Eigen/src/SparseCore/SparseSelfAdjointView.h >>> @@ -10,6 +10,11 @@ >>> #ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H >>> #define EIGEN_SPARSE_SELFADJOINTVIEW_H >>> >>> +#include <iostream> /// only for debugging >>> +#include <boost/range/irange.hpp> >>> + >>> +#include <execution> >>> + >>> #include "./InternalHeaderCheck.h" >>> >>> namespace Eigen { >>> @@ -295,8 +300,19 @@ inline void >>> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons >>> SparseLhsTypeNested lhs_nested(lhs); >>> LhsEval lhsEval(lhs_nested); >>> >>> + Eigen::initParallel(); >>> + Index threads = Eigen::nbThreads(); >>> + >>> + std::cout << "\ndot threads: "<< threads << " rhs-cols: " << >>> rhs.cols() << std::endl; >>> + >>> + // #pragma omp parallel for >>> + // #pragma omp parallel for >>> schedule(dynamic,(rhs.cols()+threads*4-1)/(threads*4)) num_threads(threads) >>> + // for (Index k=0; k<rhs.cols(); ++k) >>> + >>> + auto r = boost::irange(rhs.cols()); >>> + >>> + std::for_each(std::execution::par, r.begin(), r.end(), [&](Index k) >>> // work on one column at once >>> - for (Index k=0; k<rhs.cols(); ++k) >>> { >>> for (Index j=0; j<lhs.outerSize(); ++j) >>> { >>> @@ -330,6 +346,7 @@ inline void >>> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons >>> res.coeffRef(j,k) += alpha * i.value() * rhs.coeff(j,k); >>> } >>> } >>> + ); >>> } >>> >>> >>> >>>
