Re: [FEniCS] mpi groups via petsc4py

Martin Sandve Alnæs Wed, 26 Nov 2014 01:06:50 -0800

Surely the group_comm object does not exist on processes outside the group,
and the Expression object construction can only happen within the group?


I don't see how anything else makes sense. But clear docstring is always
good.

Btw, can we assert that the jit signatures match across the group? I'm a
bit nervous about bugs in nonuniform mpi programs, and that would be a good
early indicator of something funny happening.

Martin
26. nov. 2014 09:43 skrev "Garth N. Wells" <[email protected]>:

> On Wed, 26 Nov, 2014 at 8:32 AM, Johan Hake <[email protected]> wrote:
>
>> On Wed, Nov 26, 2014 at 9:22 AM, Garth N. Wells <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, 26 Nov, 2014 at 7:50 AM, Johan Hake <[email protected]> wrote:
>>>
>>>> On Wed, Nov 26, 2014 at 8:34 AM, Garth N. Wells <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, 25 Nov, 2014 at 9:48 PM, Johan Hake <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> I just pushed some fixes to the jit interface of DOLFIN. Now one can
>>>>>> jit on different mpi groups.
>>>>>>
>>>>>
>>>>> Nice.
>>>>>
>>>>>  Previously jiting was only done on rank 1 of the mpi_comm_world. Now
>>>>>> it is done on rank 1 of any passed group communicator.
>>>>>>
>>>>>
>>>>> Do you mean rank 0?
>>>>>
>>>>
>>>> Yes, of course.
>>>>
>>>>
>>>>  There is no demo atm showing this but a test has been added:
>>>>>>
>>>>>>   test/unit/python/jit/test_jit_with_mpi_groups.py
>>>>>>
>>>>>> Here an expression, a subdomain, and a form is constructed on
>>>>>> different ranks using group. It is somewhat tedious as one need to
>>>>>> initialize PETSc with the same group, otherwise PETSc will deadlock 
>>>>>> during
>>>>>> initialization (the moment a PETSc la object is constructed).
>>>>>>
>>>>>
>>>>> This is ok. It's arguably a design flaw that we don't make the user
>>>>> handle MPI initialisation manually.
>>>>>
>>>>
>>>> Sure, it is just somewhat tedious. You cannot start your typical
>>>> script with importing dolfin.
>>>>
>>>>  The procedure in Python for this is:
>>>>>>
>>>>>> 1) Construct mpi groups using mpi4py
>>>>>> 2) Initalize petscy4py using the groups
>>>>>> 3) Wrap groups to petsc4py comm (dolfin only support petsc4py not
>>>>>> mpi4py)
>>>>>> 4) import dolfin
>>>>>> 5) Do group specific stuff:
>>>>>>    a) Function and forms no change needed as communicator
>>>>>>       is passed via mesh
>>>>>>    b) domain = CompiledSubDomain("...", mpi_comm=group_comm)
>>>>>>    c) e = Expression("...", mpi_comm=group_comm)
>>>>>>
>>>>>
>>>>> It's not so clear whether passing the communicator means that the
>>>>> Expression is only defined/available on group_comm, or if group_comm is
>>>>> simply to control who does the JIT. Could you clarify this?
>>>>>
>>>>
>>>> My knowledge is not that good in MPI. I have only tried to access (and
>>>> construct) the Expression on ranks included in that group. Also when I
>>>> tried construct one using a group communicator on a rank that is not
>>>> included in the group, I got an when calling MPI_size on it. There is
>>>> probably a perfectly reasonable explaination to this. 
>>>>
>>>
>>> Could you clarify what goes on behind-the-scenes with the communicator?
>>> Is it only used in a call to get the process rank? What do the ranks other
>>> than zero do?
>>>
>>
>> Not sure what you want to know. Instead of using mpi_comm_world to
>> construct meshes you use the group communicator. This communicator has its
>> own local group of ranks. JITing is still done on rank 0 of the local
>> group, which might and most often is different from rank 0 process of the
>> mpi_comm_word.
>>
>
> I just want to be clear (and have in the docstring) that
>
>    e = Expression("...", mpi_comm=group_comm)
>
> is valid only on group_comm (if this is the case), or make clear that the
> communicator only determines the process that does the JIT.
>
> If we required all Expressions to have a domain/mesh, as Martin advocates,
> things would be clearer.
>
>  The group communicator works exactly like the world communicator but now
>> on just a subset of the processes. There were some sharp edges with
>> deadlocks as a consequence, when barriers were taken on the world
>> communicator. This is done by default when dolfin is imported and petcs
>> gets initialized with the world communicator. So we need to initialized
>> petsc using the group communicator. Other than that there are not real
>> differences.
>>
>
> That doesn't sound right. PETSc initialisation does not take a
> communicator. It is collective on MPI_COMM_WORLD, but each PETSc object
> takes a communicator at construction, which can be something other than
> MPI_COMM_WORLD or MPI_COMM_SELF.
>
> Garth
>
>
>> Johan
>>
>>
>>
>>> Garth
>>>
>>>
>>>
>>>  Please try it out and report any sharp edges. A demo would also be fun
>>>>>> to include :)
>>>>>>
>>>>>
>>>>> We could run tests on different communicators to speed them up on
>>>>> machines with high core counts!
>>>>>
>>>>
>>>> True!
>>>>
>>>> Johan
>>>>
>>>>
>>>>  Garth
>>>>>
>>>>>
>>>>>  Johan
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
>

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] mpi groups via petsc4py

Reply via email to