Re: [FEniCS] slow petsc cholesky factorization

Douglas N Arnold Sun, 01 Mar 2015 12:15:12 -0800

Thanks for the pointer to the code.  I see that the choice of default
solver has to consider a lot of factors.


On 03/01/2015 12:34 PM, Jan Blechta wrote:

PETSc LU/Cholesky is the last fallback option when picking default LU
solver. What is the default solver depends on few circumstances, see
https://bitbucket.org/fenics-project/dolfin/src/21ba7f92a2acb2f45351d47eca6658a24e7affc2/dolfin/la/PETScLUSolver.cpp?at=master#cl-337

If you are asking for automatic recognition of a problem symmetry (and
picking Cholesky instead of LU), I guess this is possible in UFL but
further care is needed in DOLFIN, for instance
https://bitbucket.org/fenics-project/dolfin/issue/225. But I would not
hurry with this until the fix for
https://bitbucket.org/petsc/petsc/issue/81 is widespread. Now it is
released only in PETSc 3.5.3.

Jan


On Sun, 01 Mar 2015 12:11:43 -0600
Douglas N Arnold <[email protected]> wrote:

Thanks to Jan, Miro, and Garth for this informative discussion.

It seems to me that a part of the philosophy behind FEniCS is to
allow users fine level control of the algorithms but to
have good default values.  I suspect that a large portion
of FEniCS programs solve SPD problems with the default LU
solver, which, it seems from this discussion, is not really
a good default.  Would it not be better to take another default,
e.g., if linear_solver is not set, default to mumps.  This
is not only a lot faster, but doesn't bring in the totally
unexpected behavior that if you inform the solver that the
matrix is symmetric it slows down considerably.

On 03/01/2015 11:50 AM, Jan Blechta wrote:

On Sun, 1 Mar 2015 17:48:42 +0000
Jan Blechta <[email protected]> wrote:

For the sake of objectivity, it should be noted that PETSc LU and
Cholesky perform so bad in DOLFIN because reordering is not
employed, see
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFactorSetMatOrderingType.html
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatOrderingType.html

Also note that in DOLFIN we somehow manipulate MUMPS and
SuperLU_dist ordering, see
https://bitbucket.org/fenics-project/dolfin/src/21ba7f92a2acb2f45351d47eca6658a24e7affc2/dolfin/common/SubSystemsManager.cpp?at=master#cl-191
to workaround some problems with SCOTCH and/or ParMETIS and/or
64-bit PetscInt. MUMPS automatic choice ICNTL(7) == 7 may perform
better. This hack is little bit unexpectable because the code is
only


I mean                    ^^^^^^^^^^^^
                            unpredictable

Jan

executed if DOLFIN takes care of PETSc initialization.

Jan


On Sun, 01 Mar 2015 11:15:19 -0600
Douglas N Arnold <[email protected]> wrote:

I did some testing to reproduce Miro's results.  Here are times
for a symmetric Poisson-like problem on an N x N x N mesh of the
unit cube with CG1 elements.  If I use the MUMPS solver, then
setting solver.parameters['symmetric'] to True, and so using
Cholesky, results in a modest memory saving, and about 25%-30%
faster solve. If I use PETSc instead of MUMPS, then setting
symmetric equal to True results in a modest *increase* in memory
and a *huge increase* in solver time.  For example, on a 40 x 40
x 40 mesh the PETSc LU solver uses 37 seconds while the PETSc
Cholesky solver uses 198 seconds. PETSc takes 5 times longer than
MUMPS for the LU solve and a whopping 39 times longer for the
Cholesky solve.

As Garth indicated, if you want to solve large systems by direct
solve don't use PETSc, (and, if you must, don't tell PETSc that
your matrix is symmetric).

Here are the actual timings:

Solution of Poisson problem, CG1 on N x N x N mesh of cube

                     symmetric=False  symmetric=True
       N             time  memory     time  memory
MUMPS
       20  mumps     0.3    942976    0.2    937212
       22  mumps     0.4    967744    0.4    947736
       24  mumps     0.6    992196    0.5    964740
       26  mumps     1.0   1032872    0.8    992284
       28  mumps     1.3   1078404    1.1   1025716
       30  mumps     1.9   1141312    1.4   1065532
       32  mumps     2.4   1186708    1.8   1095176
       34  mumps     3.1   1262252    2.4   1141552
       36  mumps     4.8   1374788    3.4   1221876
       38  mumps     5.4   1456412    3.9   1269188
       40  mumps     7.3   1582028    5.1   1351760
PETSC
       20  petsc     0.6    950972    2.1    950656
       22  petsc     1.1    976664    3.8    979100
       24  petsc     1.8   1007896    6.4   1015736
       26  petsc     2.9   1098964   11.0   1067084
       28  petsc     4.5   1149148   17.8   1136260
       30  petsc     6.7   1242468   28.0   1222572
       32  petsc     9.8   1322416   43.0   1322780
       34  petsc    13.9   1333408   64.8   1457924
       36  petsc    19.7   1440184   95.9   1621312
       38  petsc    27.3   1669508  139.4   1826692
       40  petsc    37.3   1723864  198.2   2076156

and here is the program that generated them:

# sample call:  python ch.py 50 'mumps' True
from dolfin import *
import sys
N = int(sys.argv[1])
method = sys.argv[2]
symmetric = (sys.argv[3] == 'True')
mesh = UnitCubeMesh(N, N, N)
V = FunctionSpace(mesh, 'CG', 1)
u = TrialFunction(V)
v = TestFunction(V)
a = ( dot(grad(u), grad(v)) + u*v ) *dx
L = v*dx
u = Function(V)
problem = LinearVariationalProblem(a, L, u)
solver = LinearVariationalSolver(problem)
solver.parameters['linear_solver'] = method
solver.parameters['symmetric'] = symmetric
timer = Timer('solver')
timer.start()
solver.solve()
tim = timer.stop()
mem = memory_usage(as_string=False)
print "{:6}  {:8} {:1} {:7.1f} {:9}".\
     format(N, method, symmetric, tim, mem[1])


On 03/01/2015 08:58 AM, Garth N. Wells wrote:

The PETSc built-in direct solver is slow for large systems. It
really there for cases where LU is needed as part of another
algorithm, e.g. the coarse level is multigrid.

If you want to solve large systems, use one of the specialised
direct solvers.

Garth

On Sunday, 1 March 2015, Miro Kuchta <[email protected]
<mailto:[email protected]>> wrote:

      Hi,

      please consider the attached script. Following this
      
<https://bitbucket.org/fenics-project/dolfin/pull-request/2/use-cholesky-rather-than-lu-decomposition/diff#chg-dolfin/fem/LinearVariationalSolver.cpp>
      discussion, if method is mumps, petsc or pastix and
      we have symmetric=True the linear system is solved with
Cholesky factorization (it this so?). While testing different
method/symmetry combinations I noticed that PETSc's own symmetric
solver is easily 10 times slower then mumps (I don't have pastix
to compare against). Can anyone else reproduce
      this? Thanks.

      Regards, Miro



--
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20



_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics


_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] slow petsc cholesky factorization

Reply via email to