Hi Peter, I would recommend that you hold off on this change for a bit. We are planning on moving the non-blocking threadpool from unsupported into core in the near future and use that as the basis for multi-threading without depending on OpenMP.
Rasmus On Mon, Sep 13, 2021 at 5:21 AM Peter <[email protected]> wrote: > Dear all, > > I'm currently playing with the sparse matrix implementation within eigen, > namely Eigen::SparseMatrix<double, Eigen::ColMajor> combined with a self > adjoint view. > > In my application I need the multiplication of a symmetric sparse matrix > with a dense matrix, > where the dimension of the matrices are of the order of up to a few ten > thousands. > > I tried to parallelize the dot product in SparseSelfAdjointView.h by > copying a omp directive > from other parts in eigen, and I tried to parallelize the outer loop > directly > with std::execution::par, see below. > > In summary, I do not see any effect of the parallelization. > Before digging deeper into it and building a minimal working example, > has someone already achieved this? > > Could one actually directly call the corresponding MKL routine or are the > internal storage schemes different? > > Best regards > Peter > > P.S.: That's what I tried as a diff to the current master branch: > > :~/Eigen/eigen/Eigen/src/SparseCore$ git diff SparseSelfAdjointView.h > diff --git a/Eigen/src/SparseCore/SparseSelfAdjointView.h > b/Eigen/src/SparseCore/SparseSelfAdjointView.h > index 0302ef3a4..91e7d495b 100644 > --- a/Eigen/src/SparseCore/SparseSelfAdjointView.h > +++ b/Eigen/src/SparseCore/SparseSelfAdjointView.h > @@ -10,6 +10,11 @@ > #ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H > #define EIGEN_SPARSE_SELFADJOINTVIEW_H > > +#include <iostream> /// only for debugging > +#include <boost/range/irange.hpp> > + > +#include <execution> > + > #include "./InternalHeaderCheck.h" > > namespace Eigen { > @@ -295,8 +300,19 @@ inline void > sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons > SparseLhsTypeNested lhs_nested(lhs); > LhsEval lhsEval(lhs_nested); > > + Eigen::initParallel(); > + Index threads = Eigen::nbThreads(); > + > + std::cout << "\ndot threads: "<< threads << " rhs-cols: " << rhs.cols() > << std::endl; > + > + // #pragma omp parallel for > + // #pragma omp parallel for > schedule(dynamic,(rhs.cols()+threads*4-1)/(threads*4)) num_threads(threads) > + // for (Index k=0; k<rhs.cols(); ++k) > + > + auto r = boost::irange(rhs.cols()); > + > + std::for_each(std::execution::par, r.begin(), r.end(), [&](Index k) > // work on one column at once > - for (Index k=0; k<rhs.cols(); ++k) > { > for (Index j=0; j<lhs.outerSize(); ++j) > { > @@ -330,6 +346,7 @@ inline void > sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons > res.coeffRef(j,k) += alpha * i.value() * rhs.coeff(j,k); > } > } > + ); > } > > > >
