On 30 Apr 2014, at 14:33, Jan Blechta <[email protected]> wrote:
> On Wed, 30 Apr 2014 08:53:34 +0200 > Jan Blechta <[email protected]> wrote: > >> On Tue, 29 Apr 2014 22:55:13 +0200 >> "Garth N. Wells" <[email protected]> wrote: >> >>> I’ve switched the default parallel LU solver back to MUMPS and set >>> MUMPS to use AMD ordering (anything other than METIS . . ), which >>> seems to avoid MUMPS crashing when PETSc is configured with recent >>> METIS versions. >> >> We also suffered by segfaults in METIS called by MUMPS. As I remember, >> this has something to do with library mismatch because PETSc typically >> downloads its own METIS and DOLFIN is compiled against another. I will >> ask Jaroslav Hron, who solved the issue here, and let you know. > > Ok, there are few issues: > > 1. MUMPS segfaults with 5.1. This is no longer an issue as PETSc 3.3, > 3.4 and master downloads METIS 5.0. See > https://bitbucket.org/petsc/petsc/commits/1b7e3bd. Also dorsal > configures PETSc with --download-metis=1 so working METIS is picked. > PETSc dev with --download-metis=1 segfaults for me on OSX when MUMPS calls METIS ordering. I link to the version of METIS downloaded and built by PETSc. Garth > 2. There is some mess in rpaths in PETSc since PETSc switched from > make-based installer to python-based installer. But this was reported > to PETSc team (om petsc-maint so it is not available to public) and > assigned to Satish/Jed so this will be fixed. As I understand the > issue, the problem basically is that there remains some rpaths > located within build dir instead of install dir in libpetsc.so or > other libraries compiled by PETSc. We do here something like > $ chrpath --delete $(PREFIX)/lib/libpetsc.so > and then use LD_LIBRARY_PATH to setup runtime linking. Sure, this is > not bullet proof, especially when one has multiple libmetis.so (one > donwloaded by PETSc and one which DOLFIN links to). > > 3. Crash of MUMPS with SCOTCH 6 > http://mumps.enseeiht.fr/index.php?page=faq#19. But as of my > experience, MUMPS does not automatically choose SCOTCH ordering. > > As a result, I think that we don't need to pickup AMD ordering and let > MUMPS choose the best at run-time. It is at least working on our > system, but I'm not sure whether the workaround of 2. above is > influencing this. > > Jan > >> >> Jan >> >>> >>> Garth >>> >>> On 27 Mar 2014, at 11:52, Garth N. Wells <[email protected]> wrote: >>> >>>> >>>> On 26 Mar 2014, at 18:45, Jan Blechta <[email protected]> >>>> wrote: >>>> >>>>> On Wed, 26 Mar 2014 17:16:13 +0100 >>>>> "Garth N. Wells" <[email protected]> wrote: >>>>> >>>>>> >>>>>> On 26 Mar 2014, at 16:56, Jan Blechta >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> On Wed, 26 Mar 2014 16:29:11 +0100 >>>>>>> "Garth N. Wells" <[email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> On 26 Mar 2014, at 16:26, Jan Blechta >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>>> On Wed, 26 Mar 2014 16:16:25 +0100 >>>>>>>>> Johannes Ring <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> On Wed, Mar 26, 2014 at 1:39 PM, Jan Blechta >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> As a follow-up of 'Broken PETSc wrappers?' thread on this >>>>>>>>>>> list, can anyone reproduce incorrect (orders of magnitude) >>>>>>>>>>> norm using superlu_dist on following example? Both in >>>>>>>>>>> serial and parallel. Thanks, >>>>>>>>>> >>>>>>>>>> This is the result I got: >>>>>>>>>> >>>>>>>>>> Serial: >>>>>>>>>> >>>>>>>>>> L2 norm mumps 0.611356580181 >>>>>>>>>> L2 norm superlu_dist 92.4733890983 >>>>>>>>>> >>>>>>>>>> Parallel (2 processes): >>>>>>>>>> >>>>>>>>>> L2 norm mumps 0.611356580181 >>>>>>>>>> L2 norm superlu_dist 220.027905995 >>>>>>>>>> L2 norm mumps 0.611356580181 >>>>>>>>>> L2 norm superlu_dist 220.027905995 >>>>>>>>> >>>>>>>>> superlu_dist results are obviously wrong. Do we have broken >>>>>>>>> installations or is there something wrong with the library? >>>>>>>>> >>>>>>>>> In the latter case I would suggest switching the default back >>>>>>>>> to MUMPS. (Additionally, MUMPS has Cholesky factorization!) >>>>>>>>> What was your motivation for switching to superlu_dist, >>>>>>>>> Garth? >>>>>>>>> >>>>>>>> >>>>>>>> MUMPS often fails in parallel with global dofs, and there is >>>>>>>> no indication that MUMPS developers are willing to fix bugs. >>>>>>> >>>>>>> I'm not sure what do you mean by 'MUMPS fails’. >>>>>> >>>>>> Crashes. >>>>>> >>>>>>> I also observe that >>>>>>> MUMPS sometimes fails because size of work arrays estimated >>>>>>> during symbolic factorization is not sufficient for actual >>>>>>> numeric factorization with pivoting. But this is hardly a bug. >>>>>> >>>>>> It has bugs with versions of SCOTCH. We’ve been over this >>>>>> before. What you describe above indeed isn’t a bug, but just >>>>>> poor software design in MUMPS. >>>>>> >>>>>>> It can by >>>>>>> analyzed simply by increasing verbosity >>>>>>> >>>>>>> PETScOptions.set('mat_mumps_icntl_4', 3) >>>>>>> >>>>>>> and fixed by increasing ' >>>>>>> work array increase percentage' >>>>>>> >>>>>>> PETScOptions.set('mat_mumps_icntl_14', 50) # default=25 >>>>>>> >>>>>>> or decreasing pivoting threshold. I have suspicion that >>>>>>> frequent reason for this is using too small partitions (too >>>>>>> much processes). (Users should also use Cholesky and >>>>>>> PD-Cholesky whenever possible. Numerics is much more better >>>>>>> and more things are predictable in analysis phase.) >>>>>>> >>>>>>> On the other superlu_dist is computing rubbish without any >>>>>>> warning for me and Johannes. Can you duplicate? >>>>>>> >>>>>> >>>>>> I haven’t had time to look. We should have unit testing for LU >>>>>> solvers. From memory I don’t think we do. >>>>> >>>>> Ok, fix is here switch column ordering >>>>> PETScOptions.set('mat_superlu_dist_colperm', col_ordering) >>>>> >>>>> col_ordering | properties >>>>> -------------------------------------- >>>>> NATURAL | works, large fill-in >>>>> MMD_AT_PLUS_A | works, smallest fill-in (for this case) >>>>> MMD_ATA | works, reasonable fill-in >>>>> METIS_AT_PLUS_A | computes rubish (default on my system for >>>>> this case) PARMETIS | supported only in parallel, >>>>> computes rubish >>>>> >>>>> or row ordering >>>>> PETScOptions.set('mat_superlu_dist_rowperm', row_ordering) >>>>> >>>>> row_ordering | properties >>>>> -------------------------------------- >>>>> NATURAL | works, good fill-in >>>>> LargeDiag | computes rubish (default on my system for >>>>> this case) >>>>> >>>>> or both. >>>>> >>>> >>>> Good digging. Is there anyway to know when superlu_dist is going >>>> to return garbage? It’s concerning that it can silently return a >>>> solution that is way off. >>>> >>>> Garth >>>> >>>>> Jan >>>>> >>>>>> >>>>>> Garth >>>>>> >>>>>>> Jan >>>>>>> >>>>>>>> >>>>>>>> Garth >>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Johannes >>>>>>>>>> _______________________________________________ >>>>>>>>>> fenics-support mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support >>>>>> >>>>>> _______________________________________________ >>>>>> fenics-support mailing list >>>>>> [email protected] >>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support >>>> >>>> _______________________________________________ >>>> fenics-support mailing list >>>> [email protected] >>>> http://fenicsproject.org/mailman/listinfo/fenics-support >>> >>> _______________________________________________ >>> fenics-support mailing list >>> [email protected] >>> http://fenicsproject.org/mailman/listinfo/fenics-support >> >> _______________________________________________ >> fenics-support mailing list >> [email protected] >> http://fenicsproject.org/mailman/listinfo/fenics-support _______________________________________________ fenics-support mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics-support
