Re: [FEniCS-support] broken superlu_dist

Jan Blechta Wed, 30 Apr 2014 06:43:04 -0700

On Wed, 30 Apr 2014 15:21:07 +0200
"Garth N. Wells" <[email protected]> wrote:


> 
> On 30 Apr 2014, at 14:33, Jan Blechta <[email protected]>
> wrote:
> 
> > On Wed, 30 Apr 2014 08:53:34 +0200
> > Jan Blechta <[email protected]> wrote:
> > 
> >> On Tue, 29 Apr 2014 22:55:13 +0200
> >> "Garth N. Wells" <[email protected]> wrote:
> >> 
> >>> I’ve switched the default parallel LU solver back to MUMPS and set
> >>> MUMPS to use AMD ordering (anything other than METIS . . ), which
> >>> seems to avoid MUMPS crashing when PETSc is configured with recent
> >>> METIS versions.
> >> 
> >> We also suffered by segfaults in METIS called by MUMPS. As I
> >> remember, this has something to do with library mismatch because
> >> PETSc typically downloads its own METIS and DOLFIN is compiled
> >> against another. I will ask Jaroslav Hron, who solved the issue
> >> here, and let you know.
> > 
> > Ok, there are few issues:
> > 
> > 1. MUMPS segfaults with 5.1. This is no longer an issue as PETSc
> > 3.3, 3.4 and master downloads METIS 5.0. See
> >   https://bitbucket.org/petsc/petsc/commits/1b7e3bd. Also dorsal
> >   configures PETSc with --download-metis=1 so working METIS is
> > picked.
> > 
> 
> PETSc dev with --download-metis=1 segfaults for me on OSX when MUMPS
> calls METIS ordering. I link to the version of METIS downloaded and
> built by PETSc. 

Isn't there cached METIS 5.1 in your build tree? You could try cleaning
the tree to let the PETSc build system download METIS 5.0.

Jan

> 
> Garth 
> 
> > 2. There is some mess in rpaths in PETSc since PETSc switched from
> >   make-based installer to python-based installer. But this was
> > reported to PETSc team (om petsc-maint so it is not available to
> > public) and assigned to Satish/Jed so this will be fixed. As I
> > understand the issue, the problem basically is that there remains
> > some rpaths located within build dir instead of install dir in
> > libpetsc.so or other libraries compiled by PETSc. We do here
> > something like $ chrpath --delete $(PREFIX)/lib/libpetsc.so
> >   and then use LD_LIBRARY_PATH to setup runtime linking. Sure, this
> > is not bullet proof, especially when one has multiple libmetis.so
> > (one donwloaded by PETSc and one which DOLFIN links to).
> > 
> > 3. Crash of MUMPS with SCOTCH 6
> >   http://mumps.enseeiht.fr/index.php?page=faq#19. But as of my
> >   experience, MUMPS does not automatically choose SCOTCH ordering.
> > 
> > As a result, I think that we don't need to pickup AMD ordering and
> > let MUMPS choose the best at run-time. It is at least working on our
> > system, but I'm not sure whether the workaround of 2. above is
> > influencing this.
> > 
> > Jan
> > 
> >> 
> >> Jan
> >> 
> >>> 
> >>> Garth
> >>> 
> >>> On 27 Mar 2014, at 11:52, Garth N. Wells <[email protected]> wrote:
> >>> 
> >>>> 
> >>>> On 26 Mar 2014, at 18:45, Jan Blechta
> >>>> <[email protected]> wrote:
> >>>> 
> >>>>> On Wed, 26 Mar 2014 17:16:13 +0100
> >>>>> "Garth N. Wells" <[email protected]> wrote:
> >>>>> 
> >>>>>> 
> >>>>>> On 26 Mar 2014, at 16:56, Jan Blechta
> >>>>>> <[email protected]> wrote:
> >>>>>> 
> >>>>>>> On Wed, 26 Mar 2014 16:29:11 +0100
> >>>>>>> "Garth N. Wells" <[email protected]> wrote:
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> On 26 Mar 2014, at 16:26, Jan Blechta
> >>>>>>>> <[email protected]> wrote:
> >>>>>>>> 
> >>>>>>>>> On Wed, 26 Mar 2014 16:16:25 +0100
> >>>>>>>>> Johannes Ring <[email protected]> wrote:
> >>>>>>>>> 
> >>>>>>>>>> On Wed, Mar 26, 2014 at 1:39 PM, Jan Blechta
> >>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>> As a follow-up of 'Broken PETSc wrappers?' thread on this
> >>>>>>>>>>> list, can anyone reproduce incorrect (orders of magnitude)
> >>>>>>>>>>> norm using superlu_dist on following example? Both in
> >>>>>>>>>>> serial and parallel. Thanks,
> >>>>>>>>>> 
> >>>>>>>>>> This is the result I got:
> >>>>>>>>>> 
> >>>>>>>>>> Serial:
> >>>>>>>>>> 
> >>>>>>>>>> L2 norm mumps        0.611356580181
> >>>>>>>>>> L2 norm superlu_dist 92.4733890983
> >>>>>>>>>> 
> >>>>>>>>>> Parallel (2 processes):
> >>>>>>>>>> 
> >>>>>>>>>> L2 norm mumps        0.611356580181
> >>>>>>>>>> L2 norm superlu_dist 220.027905995
> >>>>>>>>>> L2 norm mumps        0.611356580181
> >>>>>>>>>> L2 norm superlu_dist 220.027905995
> >>>>>>>>> 
> >>>>>>>>> superlu_dist results are obviously wrong. Do we have broken
> >>>>>>>>> installations or is there something wrong with the library?
> >>>>>>>>> 
> >>>>>>>>> In the latter case I would suggest switching the default
> >>>>>>>>> back to MUMPS. (Additionally, MUMPS has Cholesky
> >>>>>>>>> factorization!) What was your motivation for switching to
> >>>>>>>>> superlu_dist, Garth?
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> MUMPS often fails in parallel with global dofs, and there is
> >>>>>>>> no indication that MUMPS developers are willing to fix bugs.
> >>>>>>> 
> >>>>>>> I'm not sure what do you mean by 'MUMPS fails’.
> >>>>>> 
> >>>>>> Crashes.
> >>>>>> 
> >>>>>>> I also observe that
> >>>>>>> MUMPS sometimes fails because size of work arrays estimated
> >>>>>>> during symbolic factorization is not sufficient for actual
> >>>>>>> numeric factorization with pivoting. But this is hardly a bug.
> >>>>>> 
> >>>>>> It has bugs with versions of SCOTCH. We’ve been over this
> >>>>>> before. What you describe above indeed isn’t a bug, but just
> >>>>>> poor software design in MUMPS.
> >>>>>> 
> >>>>>>> It can by
> >>>>>>> analyzed simply by increasing verbosity
> >>>>>>> 
> >>>>>>> PETScOptions.set('mat_mumps_icntl_4', 3)
> >>>>>>> 
> >>>>>>> and fixed by increasing '
> >>>>>>> work array increase percentage'
> >>>>>>> 
> >>>>>>> PETScOptions.set('mat_mumps_icntl_14', 50) # default=25
> >>>>>>> 
> >>>>>>> or decreasing pivoting threshold. I have suspicion that
> >>>>>>> frequent reason for this is using too small partitions (too
> >>>>>>> much processes). (Users should also use Cholesky and
> >>>>>>> PD-Cholesky whenever possible. Numerics is much more better
> >>>>>>> and more things are predictable in analysis phase.)
> >>>>>>> 
> >>>>>>> On the other superlu_dist is computing rubbish without any
> >>>>>>> warning for me and Johannes. Can you duplicate?
> >>>>>>> 
> >>>>>> 
> >>>>>> I haven’t had time to look. We should have unit testing for LU
> >>>>>> solvers. From memory I don’t think we do.
> >>>>> 
> >>>>> Ok, fix is here switch column ordering
> >>>>> PETScOptions.set('mat_superlu_dist_colperm', col_ordering)
> >>>>> 
> >>>>> col_ordering      | properties
> >>>>> --------------------------------------
> >>>>> NATURAL           | works, large fill-in
> >>>>> MMD_AT_PLUS_A     | works, smallest fill-in (for this case)
> >>>>> MMD_ATA           | works, reasonable fill-in
> >>>>> METIS_AT_PLUS_A   | computes rubish (default on my system for
> >>>>> this case) PARMETIS          | supported only in parallel,
> >>>>> computes rubish
> >>>>> 
> >>>>> or row ordering
> >>>>> PETScOptions.set('mat_superlu_dist_rowperm', row_ordering)
> >>>>> 
> >>>>> row_ordering      | properties
> >>>>> --------------------------------------
> >>>>> NATURAL           | works, good fill-in
> >>>>> LargeDiag         | computes rubish (default on my system for
> >>>>> this case)
> >>>>> 
> >>>>> or both.
> >>>>> 
> >>>> 
> >>>> Good digging. Is there anyway to know when superlu_dist is going
> >>>> to return garbage? It’s concerning that it can silently return a
> >>>> solution that is way off.
> >>>> 
> >>>> Garth
> >>>> 
> >>>>> Jan
> >>>>> 
> >>>>>> 
> >>>>>> Garth
> >>>>>> 
> >>>>>>> Jan
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> Garth
> >>>>>>>> 
> >>>>>>>>> Jan
> >>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Johannes
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> fenics-support mailing list
> >>>>>>>>>> [email protected]
> >>>>>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> fenics-support mailing list
> >>>>>> [email protected]
> >>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
> >>>> 
> >>>> _______________________________________________
> >>>> fenics-support mailing list
> >>>> [email protected]
> >>>> http://fenicsproject.org/mailman/listinfo/fenics-support
> >>> 
> >>> _______________________________________________
> >>> fenics-support mailing list
> >>> [email protected]
> >>> http://fenicsproject.org/mailman/listinfo/fenics-support
> >> 
> >> _______________________________________________
> >> fenics-support mailing list
> >> [email protected]
> >> http://fenicsproject.org/mailman/listinfo/fenics-support
> 
> _______________________________________________
> fenics-support mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] broken superlu_dist

Reply via email to