Hi Lorenzo,

Thanks for the input. For Krylov substance solvers most of the work is
usually in the matrix-vector multiplies (and possibly the preconditioner if
you provide your own), so I'm not sure what you mean by scaling in this
case. Are you running, e.g., conjugate gradients on a dense Eigen::Matrix
(I assume not) or sparse Eigen::SparseMatrix? In the latter case, most of
the scaling would be in the sparse matrix-vector product. The vector
operations (dot product, norms etc.) in the Krylov solvers are not
multi-threaded in Eigen AFAICT.

This type of multi-threading _is_ available in the Tensor library, if you
use a ThreadPoolDevice for evaluation, see, for example
https://gitlab.com/libeigen/eigen/-/blob/master/unsupported/test/cxx11_tensor_thread_pool.cpp#L50
However, adding this type of general multi-threaded evaluation to matrix
expression in Eigen Core by unifying it with Tensor, is a much longer term
project that nobody is working AFAICT, although it has been on the wishlist
for a long time.

Best regard,
  Rasmus

On Mon, Sep 13, 2021 at 11:52 PM Lorenzo Botti <[email protected]>
wrote:

> Dear all,
> I hope that this new implementation also improves the scaling of Eigen
> solvers, e.g. conjugate gradient and biconjugate gradient. What I usually
> get is a factor 2 speed up independently from the number of threads.
>
> Thanks for the effort.
> Best regards.
> Lorenzo
>
>
>
> On Mon, Sep 13, 2021, 17:52 Rasmus Munk Larsen <[email protected]>
> wrote:
>
>> Hi Peter,
>>
>> I would recommend that you hold off on this change for a bit. We are
>> planning on moving the non-blocking threadpool from unsupported into core
>> in the near future and use that as the basis for multi-threading without
>> depending on OpenMP.
>>
>> Rasmus
>>
>> On Mon, Sep 13, 2021 at 5:21 AM Peter <[email protected]> wrote:
>>
>>> Dear all,
>>>
>>> I'm currently playing with the sparse matrix implementation within eigen,
>>> namely Eigen::SparseMatrix<double, Eigen::ColMajor> combined with a self
>>> adjoint view.
>>>
>>> In my application I need the multiplication of a symmetric sparse matrix
>>> with a dense matrix,
>>> where the dimension of the matrices are of the order of up to a few ten
>>> thousands.
>>>
>>> I tried to parallelize the dot product in SparseSelfAdjointView.h by
>>> copying a omp directive
>>> from other parts in eigen, and I tried to parallelize the outer loop
>>> directly
>>> with std::execution::par, see below.
>>>
>>> In summary, I do not see any effect of the parallelization.
>>> Before digging deeper into it and building a minimal working example,
>>> has someone  already achieved this?
>>>
>>> Could one actually directly call the corresponding MKL routine or are
>>> the internal storage schemes different?
>>>
>>> Best regards
>>> Peter
>>>
>>> P.S.: That's what I tried as a diff to the current master branch:
>>>
>>> :~/Eigen/eigen/Eigen/src/SparseCore$ git diff SparseSelfAdjointView.h
>>> diff --git a/Eigen/src/SparseCore/SparseSelfAdjointView.h
>>> b/Eigen/src/SparseCore/SparseSelfAdjointView.h
>>> index 0302ef3a4..91e7d495b 100644
>>> --- a/Eigen/src/SparseCore/SparseSelfAdjointView.h
>>> +++ b/Eigen/src/SparseCore/SparseSelfAdjointView.h
>>> @@ -10,6 +10,11 @@
>>>  #ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H
>>>  #define EIGEN_SPARSE_SELFADJOINTVIEW_H
>>>
>>> +#include <iostream> /// only for debugging
>>> +#include <boost/range/irange.hpp>
>>> +
>>> +#include <execution>
>>> +
>>>  #include "./InternalHeaderCheck.h"
>>>
>>>  namespace Eigen {
>>> @@ -295,8 +300,19 @@ inline void
>>> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons
>>>    SparseLhsTypeNested lhs_nested(lhs);
>>>    LhsEval lhsEval(lhs_nested);
>>>
>>> +  Eigen::initParallel();
>>> +  Index threads = Eigen::nbThreads();
>>> +
>>> +  std::cout << "\ndot threads: "<< threads << " rhs-cols: " <<
>>> rhs.cols() << std::endl;
>>> +
>>> +  // #pragma omp parallel for
>>> +  // #pragma omp parallel for
>>> schedule(dynamic,(rhs.cols()+threads*4-1)/(threads*4)) num_threads(threads)
>>> +  // for (Index k=0; k<rhs.cols(); ++k)
>>> +
>>> +  auto r = boost::irange(rhs.cols());
>>> +
>>> +  std::for_each(std::execution::par, r.begin(), r.end(),  [&](Index k)
>>>    // work on one column at once
>>> -  for (Index k=0; k<rhs.cols(); ++k)
>>>    {
>>>      for (Index j=0; j<lhs.outerSize(); ++j)
>>>      {
>>> @@ -330,6 +346,7 @@ inline void
>>> sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons
>>>          res.coeffRef(j,k) += alpha * i.value() * rhs.coeff(j,k);
>>>      }
>>>    }
>>> +  );
>>>  }
>>>
>>>
>>>
>>>

Reply via email to