Re: [FEniCS-support] broken superlu_dist

Garth N. Wells Wed, 30 Apr 2014 06:21:27 -0700

On 30 Apr 2014, at 14:33, Jan Blechta <[email protected]> wrote:


> On Wed, 30 Apr 2014 08:53:34 +0200
> Jan Blechta <[email protected]> wrote:
> 
>> On Tue, 29 Apr 2014 22:55:13 +0200
>> "Garth N. Wells" <[email protected]> wrote:
>> 
>>> I’ve switched the default parallel LU solver back to MUMPS and set
>>> MUMPS to use AMD ordering (anything other than METIS . . ), which
>>> seems to avoid MUMPS crashing when PETSc is configured with recent
>>> METIS versions.
>> 
>> We also suffered by segfaults in METIS called by MUMPS. As I remember,
>> this has something to do with library mismatch because PETSc typically
>> downloads its own METIS and DOLFIN is compiled against another. I will
>> ask Jaroslav Hron, who solved the issue here, and let you know.
> 
> Ok, there are few issues:
> 
> 1. MUMPS segfaults with 5.1. This is no longer an issue as PETSc 3.3,
>   3.4 and master downloads METIS 5.0. See
>   https://bitbucket.org/petsc/petsc/commits/1b7e3bd. Also dorsal
>   configures PETSc with --download-metis=1 so working METIS is picked.
> 

PETSc dev with --download-metis=1 segfaults for me on OSX when MUMPS calls 
METIS ordering. I link to the version of METIS downloaded and built by PETSc. 

Garth 

> 2. There is some mess in rpaths in PETSc since PETSc switched from
>   make-based installer to python-based installer. But this was reported
>   to PETSc team (om petsc-maint so it is not available to public) and
>   assigned to Satish/Jed so this will be fixed. As I understand the
>   issue, the problem basically is that there remains some rpaths
>   located within build dir instead of install dir in libpetsc.so or
>   other libraries compiled by PETSc. We do here something like
>       $ chrpath --delete $(PREFIX)/lib/libpetsc.so
>   and then use LD_LIBRARY_PATH to setup runtime linking. Sure, this is
>   not bullet proof, especially when one has multiple libmetis.so (one
>   donwloaded by PETSc and one which DOLFIN links to).
> 
> 3. Crash of MUMPS with SCOTCH 6
>   http://mumps.enseeiht.fr/index.php?page=faq#19. But as of my
>   experience, MUMPS does not automatically choose SCOTCH ordering.
> 
> As a result, I think that we don't need to pickup AMD ordering and let
> MUMPS choose the best at run-time. It is at least working on our
> system, but I'm not sure whether the workaround of 2. above is
> influencing this.
> 
> Jan
> 
>> 
>> Jan
>> 
>>> 
>>> Garth
>>> 
>>> On 27 Mar 2014, at 11:52, Garth N. Wells <[email protected]> wrote:
>>> 
>>>> 
>>>> On 26 Mar 2014, at 18:45, Jan Blechta <[email protected]>
>>>> wrote:
>>>> 
>>>>> On Wed, 26 Mar 2014 17:16:13 +0100
>>>>> "Garth N. Wells" <[email protected]> wrote:
>>>>> 
>>>>>> 
>>>>>> On 26 Mar 2014, at 16:56, Jan Blechta
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>>> On Wed, 26 Mar 2014 16:29:11 +0100
>>>>>>> "Garth N. Wells" <[email protected]> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On 26 Mar 2014, at 16:26, Jan Blechta
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> On Wed, 26 Mar 2014 16:16:25 +0100
>>>>>>>>> Johannes Ring <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 26, 2014 at 1:39 PM, Jan Blechta
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>> As a follow-up of 'Broken PETSc wrappers?' thread on this
>>>>>>>>>>> list, can anyone reproduce incorrect (orders of magnitude)
>>>>>>>>>>> norm using superlu_dist on following example? Both in
>>>>>>>>>>> serial and parallel. Thanks,
>>>>>>>>>> 
>>>>>>>>>> This is the result I got:
>>>>>>>>>> 
>>>>>>>>>> Serial:
>>>>>>>>>> 
>>>>>>>>>> L2 norm mumps        0.611356580181
>>>>>>>>>> L2 norm superlu_dist 92.4733890983
>>>>>>>>>> 
>>>>>>>>>> Parallel (2 processes):
>>>>>>>>>> 
>>>>>>>>>> L2 norm mumps        0.611356580181
>>>>>>>>>> L2 norm superlu_dist 220.027905995
>>>>>>>>>> L2 norm mumps        0.611356580181
>>>>>>>>>> L2 norm superlu_dist 220.027905995
>>>>>>>>> 
>>>>>>>>> superlu_dist results are obviously wrong. Do we have broken
>>>>>>>>> installations or is there something wrong with the library?
>>>>>>>>> 
>>>>>>>>> In the latter case I would suggest switching the default back
>>>>>>>>> to MUMPS. (Additionally, MUMPS has Cholesky factorization!)
>>>>>>>>> What was your motivation for switching to superlu_dist,
>>>>>>>>> Garth?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> MUMPS often fails in parallel with global dofs, and there is
>>>>>>>> no indication that MUMPS developers are willing to fix bugs.
>>>>>>> 
>>>>>>> I'm not sure what do you mean by 'MUMPS fails’.
>>>>>> 
>>>>>> Crashes.
>>>>>> 
>>>>>>> I also observe that
>>>>>>> MUMPS sometimes fails because size of work arrays estimated
>>>>>>> during symbolic factorization is not sufficient for actual
>>>>>>> numeric factorization with pivoting. But this is hardly a bug.
>>>>>> 
>>>>>> It has bugs with versions of SCOTCH. We’ve been over this
>>>>>> before. What you describe above indeed isn’t a bug, but just
>>>>>> poor software design in MUMPS.
>>>>>> 
>>>>>>> It can by
>>>>>>> analyzed simply by increasing verbosity
>>>>>>> 
>>>>>>> PETScOptions.set('mat_mumps_icntl_4', 3)
>>>>>>> 
>>>>>>> and fixed by increasing '
>>>>>>> work array increase percentage'
>>>>>>> 
>>>>>>> PETScOptions.set('mat_mumps_icntl_14', 50) # default=25
>>>>>>> 
>>>>>>> or decreasing pivoting threshold. I have suspicion that
>>>>>>> frequent reason for this is using too small partitions (too
>>>>>>> much processes). (Users should also use Cholesky and
>>>>>>> PD-Cholesky whenever possible. Numerics is much more better
>>>>>>> and more things are predictable in analysis phase.)
>>>>>>> 
>>>>>>> On the other superlu_dist is computing rubbish without any
>>>>>>> warning for me and Johannes. Can you duplicate?
>>>>>>> 
>>>>>> 
>>>>>> I haven’t had time to look. We should have unit testing for LU
>>>>>> solvers. From memory I don’t think we do.
>>>>> 
>>>>> Ok, fix is here switch column ordering
>>>>> PETScOptions.set('mat_superlu_dist_colperm', col_ordering)
>>>>> 
>>>>> col_ordering      | properties
>>>>> --------------------------------------
>>>>> NATURAL           | works, large fill-in
>>>>> MMD_AT_PLUS_A     | works, smallest fill-in (for this case)
>>>>> MMD_ATA           | works, reasonable fill-in
>>>>> METIS_AT_PLUS_A   | computes rubish (default on my system for
>>>>> this case) PARMETIS          | supported only in parallel,
>>>>> computes rubish
>>>>> 
>>>>> or row ordering
>>>>> PETScOptions.set('mat_superlu_dist_rowperm', row_ordering)
>>>>> 
>>>>> row_ordering      | properties
>>>>> --------------------------------------
>>>>> NATURAL           | works, good fill-in
>>>>> LargeDiag         | computes rubish (default on my system for
>>>>> this case)
>>>>> 
>>>>> or both.
>>>>> 
>>>> 
>>>> Good digging. Is there anyway to know when superlu_dist is going
>>>> to return garbage? It’s concerning that it can silently return a
>>>> solution that is way off.
>>>> 
>>>> Garth
>>>> 
>>>>> Jan
>>>>> 
>>>>>> 
>>>>>> Garth
>>>>>> 
>>>>>>> Jan
>>>>>>> 
>>>>>>>> 
>>>>>>>> Garth
>>>>>>>> 
>>>>>>>>> Jan
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Johannes
>>>>>>>>>> _______________________________________________
>>>>>>>>>> fenics-support mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
>>>>>> 
>>>>>> _______________________________________________
>>>>>> fenics-support mailing list
>>>>>> [email protected]
>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
>>>> 
>>>> _______________________________________________
>>>> fenics-support mailing list
>>>> [email protected]
>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
>>> 
>>> _______________________________________________
>>> fenics-support mailing list
>>> [email protected]
>>> http://fenicsproject.org/mailman/listinfo/fenics-support
>> 
>> _______________________________________________
>> fenics-support mailing list
>> [email protected]
>> http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] broken superlu_dist

Reply via email to