Dear all, I hope that this new implementation also improves the scaling of Eigen solvers, e.g. conjugate gradient and biconjugate gradient. What I usually get is a factor 2 speed up independently from the number of threads.
Thanks for the effort. Best regards. Lorenzo On Mon, Sep 13, 2021, 17:52 Rasmus Munk Larsen <[email protected]> wrote: > Hi Peter, > > I would recommend that you hold off on this change for a bit. We are > planning on moving the non-blocking threadpool from unsupported into core > in the near future and use that as the basis for multi-threading without > depending on OpenMP. > > Rasmus > > On Mon, Sep 13, 2021 at 5:21 AM Peter <[email protected]> wrote: > >> Dear all, >> >> I'm currently playing with the sparse matrix implementation within eigen, >> namely Eigen::SparseMatrix<double, Eigen::ColMajor> combined with a self >> adjoint view. >> >> In my application I need the multiplication of a symmetric sparse matrix >> with a dense matrix, >> where the dimension of the matrices are of the order of up to a few ten >> thousands. >> >> I tried to parallelize the dot product in SparseSelfAdjointView.h by >> copying a omp directive >> from other parts in eigen, and I tried to parallelize the outer loop >> directly >> with std::execution::par, see below. >> >> In summary, I do not see any effect of the parallelization. >> Before digging deeper into it and building a minimal working example, >> has someone already achieved this? >> >> Could one actually directly call the corresponding MKL routine or are the >> internal storage schemes different? >> >> Best regards >> Peter >> >> P.S.: That's what I tried as a diff to the current master branch: >> >> :~/Eigen/eigen/Eigen/src/SparseCore$ git diff SparseSelfAdjointView.h >> diff --git a/Eigen/src/SparseCore/SparseSelfAdjointView.h >> b/Eigen/src/SparseCore/SparseSelfAdjointView.h >> index 0302ef3a4..91e7d495b 100644 >> --- a/Eigen/src/SparseCore/SparseSelfAdjointView.h >> +++ b/Eigen/src/SparseCore/SparseSelfAdjointView.h >> @@ -10,6 +10,11 @@ >> #ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H >> #define EIGEN_SPARSE_SELFADJOINTVIEW_H >> >> +#include <iostream> /// only for debugging >> +#include <boost/range/irange.hpp> >> + >> +#include <execution> >> + >> #include "./InternalHeaderCheck.h" >> >> namespace Eigen { >> @@ -295,8 +300,19 @@ inline void >> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons >> SparseLhsTypeNested lhs_nested(lhs); >> LhsEval lhsEval(lhs_nested); >> >> + Eigen::initParallel(); >> + Index threads = Eigen::nbThreads(); >> + >> + std::cout << "\ndot threads: "<< threads << " rhs-cols: " << >> rhs.cols() << std::endl; >> + >> + // #pragma omp parallel for >> + // #pragma omp parallel for >> schedule(dynamic,(rhs.cols()+threads*4-1)/(threads*4)) num_threads(threads) >> + // for (Index k=0; k<rhs.cols(); ++k) >> + >> + auto r = boost::irange(rhs.cols()); >> + >> + std::for_each(std::execution::par, r.begin(), r.end(), [&](Index k) >> // work on one column at once >> - for (Index k=0; k<rhs.cols(); ++k) >> { >> for (Index j=0; j<lhs.outerSize(); ++j) >> { >> @@ -330,6 +346,7 @@ inline void >> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons >> res.coeffRef(j,k) += alpha * i.value() * rhs.coeff(j,k); >> } >> } >> + ); >> } >> >> >> >>
