Re: [eigen] Advice on contributing a module for a multithreaded, supernodal, MPL-licensed sparse-direct LDLT/LDLH solver?

Gael Guennebaud Wed, 13 Mar 2019 01:45:32 -0700

On Sat, Mar 9, 2019 at 6:28 PM Jack Poulson <[email protected]> wrote:


> Hi Gael,
>
> Interesting! I hadn't run a comparison against CholMod yet (but it is
> obviously the gold standard). There is a slight amount of automatic
> algorithmic switching that needs to be added in (
> https://gitlab.com/hodge_star/catamari/issues/2) before I would assume a
> blackbox comparison across the gamut of matrix types would be competitive.
> Catamari currently implements:
>   * Sequential and multithreaded, supernodal right-looking (multifrontal)
> Cholesky, LDL^T, and LDL^H,
>   * Sequential and multithreaded, supernodal left-looking Cholesky, LDL^T,
> and LDL^H,
>   * Sequential scalar left-looking Cholesky, LDL^T, and LDL^H
>   * Sequential scalar up-looking Cholesky, LDL^T, and LDL^H.
>
> At the moment, the first category is used by default, but there are more
> refined categories:
>   * For large, reasonably dense, multithreaded environments, the
> supernodal right-looking approach is preferred.
>   * For large, reasonably dense, sequential environments, the supernodal
> left-looking approach might be preferred.
>   * For small/sparse problems, the up-looking approach should be
> preferred, but there is not yet parallel support for this in catamari (not
> that it would be difficult).
>

parallel simplicial would be nice, and the first publicly available
implementation AFAIK.


> According to CholMod's publications, it intelligently switches between the
> supernodal left-looking factorization and the scalar up-looking
> factorization (based upon the computational intensity as determined from
> the symbolic factorization). For higher core counts, the supernodal
> right-looking approach becomes critical so that the top of the assembly
> tree is parallelized.
>

I did not know that CholMod was supposed to be smart in this regard. On my
own experienced I always had to explicitly choose between simplicial and
supernodal modes to get best performance (and for simplicial I always use
our built-in one of course!).


> On the subject of Eigen integration: I would defer to Eigen experts on the
> best/most-useful means of integration. If you believe that there would be
> added usefulness in transplanting Catamari's logic into core Eigen (on top
> of its BLAS and matrix classes), I think that could be interesting.
>

That would definitely be the ideal option as duplicating code and efforts
is rarely a good idea. Of course, this is also more work on the short term.


> I believe that my permissively-licensed AMD implementation (quotient)
> should be roughly equivalent to SuiteSparse's (and, I suppose, Eigen's) in
> terms of performance.
>

Thanks to Google, we very recently managed to relicense SuiteSparse's AMD
as MPL2. But the code is still horrible to us, C++ developers, so yours
might still be very desirable if it's in par regarding
performance/robustness. About AMD, I slightly modified our copy to support
matrices with explicit zeros on the diagonal: those must be moved/kept at
the bottom-right corner. Typical use case is equality constraints handled
through Lagrange multipliers.


> A wrapper might be a good first step, but this conversation has gotten me
> thinking that improving the algorithmic switching logic is a priority.
>

Yes, starting with a wrapper would be a good start to make catamari visible
and easily testable/usable to the Eigen's community. If we envision an
integration within Eigen's official module, a large refactoring will be
required through.

Gaël.


>
> With Thanks,
> Jack
>
> On Sat, Mar 9, 2019, at 2:31 AM, Gael Guennebaud wrote:
>
> Hi Jack,
>
> This is a very nice piece of work, congrats! I did one benchmark if
> catamari+MKL on the "ldoor" matrix [1] and got same speed as
> suitesparse/Cholmod: about 8s on my Haswell quad-core (15s without openmp).
>
> Requiring c++14 for new features is not a big deal anymore.
>
> How do you envision this "Eigen/Catamari" module? A simple wrapper as we
> do with Cholmod/UmfPack/Pastix/Pardiso? or a built-in Eigen-ified version?
> As you probably already figured out Eigen already provides several features
> redundant with what you developed in Catamari:
> - BLAS/LAPACK layer with optional compile-time fallback to any BLAS /
> LAPACKE / MKL
> - Sparse storage
> - Fill-in reorderings, we have AMD and COLAMD, so yours might rather be
> complementary.
>
> Gaël
>
> [1] https://sparse.tamu.edu/GHS_psdef/ldoor
>
> On Fri, Mar 8, 2019 at 4:59 PM Jack Poulson <[email protected]> wrote:
>
>
> Dear Eigen Maintainers,
>
> I noticed that Eigen's current documentation only lists a simplicial LDLT
> sparse-direct solver; I spent my last several months building a header-only
> C++ (optionally multithreaded) supernodal Cholesky/LDL^T/LDL^H
> sparse-direct solver (which also implements high-performance Determinantal
> Point Process sampling with similar techniques) and am curious if it would
> be seen as a good fit for an unsupported Eigen module. The solver achieves
> as much as 1 TF in double-precision on my 16-core skylake workstation.
>
> The current project is described here:
> https://hodgestar.com/catamari/
> I released v0.1 a few days ago, but I have improved the performance in
> 'master' substantially since then.
>
> My assumption is that the biggest hurdle will be due to the solver
> (catamari) being implemented in C++14, and that Eigen is currently
> restricted to C++03. And, while catamari optionally incorporates OpenMP
> task scheduling for the multithreaded parallelism, this support is only
> enabled if an appropriate preprocessor directive is defined
> (CATAMARI_OPENMP).
>
> Any advice would be appreciated.
>
> Sincerely,
> Jack Poulson
>
>
>

Re: [eigen] Advice on contributing a module for a multithreaded, supernodal, MPL-licensed sparse-direct LDLT/LDLH solver?

Reply via email to