Re: [FEniCS-support] broken superlu_dist

Jan Blechta Tue, 29 Apr 2014 23:54:13 -0700

On Tue, 29 Apr 2014 22:55:13 +0200
"Garth N. Wells" <[email protected]> wrote:


> I’ve switched the default parallel LU solver back to MUMPS and set
> MUMPS to use AMD ordering (anything other than METIS . . ), which
> seems to avoid MUMPS crashing when PETSc is configured with recent
> METIS versions.

We also suffered by segfaults in METIS called by MUMPS. As I remember,
this has something to do with library mismatch because PETSc typically
downloads its own METIS and DOLFIN is compiled against another. I will
ask Jaroslav Hron, who solved the issue here, and let you know.

Jan

> 
> Garth
> 
> On 27 Mar 2014, at 11:52, Garth N. Wells <[email protected]> wrote:
> 
> > 
> > On 26 Mar 2014, at 18:45, Jan Blechta <[email protected]>
> > wrote:
> > 
> >> On Wed, 26 Mar 2014 17:16:13 +0100
> >> "Garth N. Wells" <[email protected]> wrote:
> >> 
> >>> 
> >>> On 26 Mar 2014, at 16:56, Jan Blechta <[email protected]>
> >>> wrote:
> >>> 
> >>>> On Wed, 26 Mar 2014 16:29:11 +0100
> >>>> "Garth N. Wells" <[email protected]> wrote:
> >>>> 
> >>>>> 
> >>>>> On 26 Mar 2014, at 16:26, Jan Blechta
> >>>>> <[email protected]> wrote:
> >>>>> 
> >>>>>> On Wed, 26 Mar 2014 16:16:25 +0100
> >>>>>> Johannes Ring <[email protected]> wrote:
> >>>>>> 
> >>>>>>> On Wed, Mar 26, 2014 at 1:39 PM, Jan Blechta
> >>>>>>> <[email protected]> wrote:
> >>>>>>>> As a follow-up of 'Broken PETSc wrappers?' thread on this
> >>>>>>>> list, can anyone reproduce incorrect (orders of magnitude)
> >>>>>>>> norm using superlu_dist on following example? Both in serial
> >>>>>>>> and parallel. Thanks,
> >>>>>>> 
> >>>>>>> This is the result I got:
> >>>>>>> 
> >>>>>>> Serial:
> >>>>>>> 
> >>>>>>> L2 norm mumps        0.611356580181
> >>>>>>> L2 norm superlu_dist 92.4733890983
> >>>>>>> 
> >>>>>>> Parallel (2 processes):
> >>>>>>> 
> >>>>>>> L2 norm mumps        0.611356580181
> >>>>>>> L2 norm superlu_dist 220.027905995
> >>>>>>> L2 norm mumps        0.611356580181
> >>>>>>> L2 norm superlu_dist 220.027905995
> >>>>>> 
> >>>>>> superlu_dist results are obviously wrong. Do we have broken
> >>>>>> installations or is there something wrong with the library?
> >>>>>> 
> >>>>>> In the latter case I would suggest switching the default back
> >>>>>> to MUMPS. (Additionally, MUMPS has Cholesky factorization!)
> >>>>>> What was your motivation for switching to superlu_dist, Garth?
> >>>>>> 
> >>>>> 
> >>>>> MUMPS often fails in parallel with global dofs, and there is no
> >>>>> indication that MUMPS developers are willing to fix bugs.
> >>>> 
> >>>> I'm not sure what do you mean by 'MUMPS fails’.
> >>> 
> >>> Crashes.
> >>> 
> >>>> I also observe that
> >>>> MUMPS sometimes fails because size of work arrays estimated
> >>>> during symbolic factorization is not sufficient for actual
> >>>> numeric factorization with pivoting. But this is hardly a bug.
> >>> 
> >>> It has bugs with versions of SCOTCH. We’ve been over this before.
> >>> What you describe above indeed isn’t a bug, but just poor software
> >>> design in MUMPS.
> >>> 
> >>>> It can by
> >>>> analyzed simply by increasing verbosity
> >>>> 
> >>>> PETScOptions.set('mat_mumps_icntl_4', 3)
> >>>> 
> >>>> and fixed by increasing '
> >>>> work array increase percentage'
> >>>> 
> >>>> PETScOptions.set('mat_mumps_icntl_14', 50) # default=25
> >>>> 
> >>>> or decreasing pivoting threshold. I have suspicion that frequent
> >>>> reason for this is using too small partitions (too much
> >>>> processes). (Users should also use Cholesky and PD-Cholesky
> >>>> whenever possible. Numerics is much more better and more things
> >>>> are predictable in analysis phase.)
> >>>> 
> >>>> On the other superlu_dist is computing rubbish without any
> >>>> warning for me and Johannes. Can you duplicate?
> >>>> 
> >>> 
> >>> I haven’t had time to look. We should have unit testing for LU
> >>> solvers. From memory I don’t think we do.
> >> 
> >> Ok, fix is here switch column ordering
> >> PETScOptions.set('mat_superlu_dist_colperm', col_ordering)
> >> 
> >> col_ordering      | properties
> >> --------------------------------------
> >> NATURAL           | works, large fill-in
> >> MMD_AT_PLUS_A     | works, smallest fill-in (for this case)
> >> MMD_ATA           | works, reasonable fill-in
> >> METIS_AT_PLUS_A   | computes rubish (default on my system for this
> >> case) PARMETIS          | supported only in parallel, computes
> >> rubish
> >> 
> >> or row ordering
> >> PETScOptions.set('mat_superlu_dist_rowperm', row_ordering)
> >> 
> >> row_ordering      | properties
> >> --------------------------------------
> >> NATURAL           | works, good fill-in
> >> LargeDiag         | computes rubish (default on my system for this
> >> case)
> >> 
> >> or both.
> >> 
> > 
> > Good digging. Is there anyway to know when superlu_dist is going to
> > return garbage? It’s concerning that it can silently return a
> > solution that is way off.
> > 
> > Garth
> > 
> >> Jan
> >> 
> >>> 
> >>> Garth
> >>> 
> >>>> Jan
> >>>> 
> >>>>> 
> >>>>> Garth
> >>>>> 
> >>>>>> Jan
> >>>>>> 
> >>>>>>> 
> >>>>>>> Johannes
> >>>>>>> _______________________________________________
> >>>>>>> fenics-support mailing list
> >>>>>>> [email protected]
> >>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
> >>> 
> >>> _______________________________________________
> >>> fenics-support mailing list
> >>> [email protected]
> >>> http://fenicsproject.org/mailman/listinfo/fenics-support
> > 
> > _______________________________________________
> > fenics-support mailing list
> > [email protected]
> > http://fenicsproject.org/mailman/listinfo/fenics-support
> 
> _______________________________________________
> fenics-support mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] broken superlu_dist

Reply via email to