Re: [FEniCS-support] broken superlu_dist

Garth N. Wells Tue, 29 Apr 2014 13:55:29 -0700

I’ve switched the default parallel LU solver back to MUMPS and set MUMPS to use 
AMD ordering (anything other than METIS . . ), which seems to avoid MUMPS 
crashing when PETSc is configured with recent METIS versions.


Garth

On 27 Mar 2014, at 11:52, Garth N. Wells <[email protected]> wrote:

> 
> On 26 Mar 2014, at 18:45, Jan Blechta <[email protected]> wrote:
> 
>> On Wed, 26 Mar 2014 17:16:13 +0100
>> "Garth N. Wells" <[email protected]> wrote:
>> 
>>> 
>>> On 26 Mar 2014, at 16:56, Jan Blechta <[email protected]>
>>> wrote:
>>> 
>>>> On Wed, 26 Mar 2014 16:29:11 +0100
>>>> "Garth N. Wells" <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> On 26 Mar 2014, at 16:26, Jan Blechta <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> On Wed, 26 Mar 2014 16:16:25 +0100
>>>>>> Johannes Ring <[email protected]> wrote:
>>>>>> 
>>>>>>> On Wed, Mar 26, 2014 at 1:39 PM, Jan Blechta
>>>>>>> <[email protected]> wrote:
>>>>>>>> As a follow-up of 'Broken PETSc wrappers?' thread on this list,
>>>>>>>> can anyone reproduce incorrect (orders of magnitude) norm using
>>>>>>>> superlu_dist on following example? Both in serial and parallel.
>>>>>>>> Thanks,
>>>>>>> 
>>>>>>> This is the result I got:
>>>>>>> 
>>>>>>> Serial:
>>>>>>> 
>>>>>>> L2 norm mumps        0.611356580181
>>>>>>> L2 norm superlu_dist 92.4733890983
>>>>>>> 
>>>>>>> Parallel (2 processes):
>>>>>>> 
>>>>>>> L2 norm mumps        0.611356580181
>>>>>>> L2 norm superlu_dist 220.027905995
>>>>>>> L2 norm mumps        0.611356580181
>>>>>>> L2 norm superlu_dist 220.027905995
>>>>>> 
>>>>>> superlu_dist results are obviously wrong. Do we have broken
>>>>>> installations or is there something wrong with the library?
>>>>>> 
>>>>>> In the latter case I would suggest switching the default back to
>>>>>> MUMPS. (Additionally, MUMPS has Cholesky factorization!) What was
>>>>>> your motivation for switching to superlu_dist, Garth?
>>>>>> 
>>>>> 
>>>>> MUMPS often fails in parallel with global dofs, and there is no
>>>>> indication that MUMPS developers are willing to fix bugs.
>>>> 
>>>> I'm not sure what do you mean by 'MUMPS fails’.
>>> 
>>> Crashes.
>>> 
>>>> I also observe that
>>>> MUMPS sometimes fails because size of work arrays estimated during
>>>> symbolic factorization is not sufficient for actual numeric
>>>> factorization with pivoting. But this is hardly a bug.
>>> 
>>> It has bugs with versions of SCOTCH. We’ve been over this before.
>>> What you describe above indeed isn’t a bug, but just poor software
>>> design in MUMPS.
>>> 
>>>> It can by
>>>> analyzed simply by increasing verbosity
>>>> 
>>>> PETScOptions.set('mat_mumps_icntl_4', 3)
>>>> 
>>>> and fixed by increasing '
>>>> work array increase percentage'
>>>> 
>>>> PETScOptions.set('mat_mumps_icntl_14', 50) # default=25
>>>> 
>>>> or decreasing pivoting threshold. I have suspicion that frequent
>>>> reason for this is using too small partitions (too much processes).
>>>> (Users should also use Cholesky and PD-Cholesky whenever possible.
>>>> Numerics is much more better and more things are predictable in
>>>> analysis phase.)
>>>> 
>>>> On the other superlu_dist is computing rubbish without any warning
>>>> for me and Johannes. Can you duplicate?
>>>> 
>>> 
>>> I haven’t had time to look. We should have unit testing for LU
>>> solvers. From memory I don’t think we do.
>> 
>> Ok, fix is here switch column ordering
>> PETScOptions.set('mat_superlu_dist_colperm', col_ordering)
>> 
>> col_ordering      | properties
>> --------------------------------------
>> NATURAL           | works, large fill-in
>> MMD_AT_PLUS_A     | works, smallest fill-in (for this case)
>> MMD_ATA           | works, reasonable fill-in
>> METIS_AT_PLUS_A   | computes rubish (default on my system for this case)
>> PARMETIS          | supported only in parallel, computes rubish
>> 
>> or row ordering
>> PETScOptions.set('mat_superlu_dist_rowperm', row_ordering)
>> 
>> row_ordering      | properties
>> --------------------------------------
>> NATURAL           | works, good fill-in
>> LargeDiag         | computes rubish (default on my system for this case)
>> 
>> or both.
>> 
> 
> Good digging. Is there anyway to know when superlu_dist is going to return 
> garbage? It’s concerning that it can silently return a solution that is way 
> off.
> 
> Garth
> 
>> Jan
>> 
>>> 
>>> Garth
>>> 
>>>> Jan
>>>> 
>>>>> 
>>>>> Garth
>>>>> 
>>>>>> Jan
>>>>>> 
>>>>>>> 
>>>>>>> Johannes
>>>>>>> _______________________________________________
>>>>>>> fenics-support mailing list
>>>>>>> [email protected]
>>>>>>> http://fenicsproject.org/mailman/listinfo/fenics-support
>>> 
>>> _______________________________________________
>>> fenics-support mailing list
>>> [email protected]
>>> http://fenicsproject.org/mailman/listinfo/fenics-support
> 
> _______________________________________________
> fenics-support mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] broken superlu_dist

Reply via email to