Re: [FEniCS] mpi groups via petsc4py

Johan Hake Wed, 26 Nov 2014 02:55:52 -0800

Btw, run it with -np 2

Johan


On Wed, Nov 26, 2014 at 11:50 AM, Johan Hake <[email protected]> wrote:

> import mpi4py.MPI as mpi
> import petsc4py.PETSc as petsc
>
> # Set communicator and get process information
> comm = mpi.COMM_WORLD
> group = comm.Get_group()
> size = comm.Get_size()
>
> # Divide the processors into two groups 1 with 1/4 and the second with 3/4
> of the ranks
> rank = comm.Get_rank()
> group_comm_0 = petsc.Comm(comm.Create(group.Incl(range(1))))
> group_comm_1 = petsc.Comm(comm.Create(group.Incl(range(1,2))))
>
> from dolfin import Expression, UnitSquareMesh, Function, TestFunction, \
>      Form, FunctionSpace, dx, CompiledSubDomain, SubSystemsManager
>
> deadlock = True
> if not deadlock:
>     SubSystemsManager.init_petsc()
>
> if rank == 0:
>     e = Expression("4", mpi_comm=group_comm_0)
>
> else:
>     mesh = UnitSquareMesh(group_comm_1, 2, 2)
>     V = FunctionSpace(mesh, "P", 1)
>
>     # If SubSystemsManager has not initalized petsc, slepc will be
>     # initialized when creating the vector in Function. That is done
>     # collectively using PETSc_COMM_WORLD==MPI_COMM_WORLD in this
>     # example
>     u = Function(V)
>     v = TestFunction(V)
>     Form(u*v*dx)
>
> On Wed, Nov 26, 2014 at 11:31 AM, Garth N. Wells <[email protected]> wrote:
>
>> On Wed, 26 Nov, 2014 at 10:24 AM, Johan Hake <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Nov 26, 2014 at 10:52 AM, Garth N. Wells <[email protected]>
>>> wrote:
>>>
>>>>
>>>> On Wed, 26 Nov, 2014 at 9:38 AM, Johan Hake <[email protected]> wrote:
>>>>
>>>>> On Wed, Nov 26, 2014 at 10:28 AM, Garth N. Wells <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Wed, 26 Nov, 2014 at 9:09 AM, Johan Hake <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, Nov 26, 2014 at 9:43 AM, Garth N. Wells <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Wed, 26 Nov, 2014 at 8:32 AM, Johan Hake <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Nov 26, 2014 at 9:22 AM, Garth N. Wells <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, 26 Nov, 2014 at 7:50 AM, Johan Hake <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 26, 2014 at 8:34 AM, Garth N. Wells <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 25 Nov, 2014 at 9:48 PM, Johan Hake <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I just pushed some fixes to the jit interface of DOLFIN. Now
>>>>>>>>>>>>> one can jit on different mpi groups.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nice.
>>>>>>>>>>>>
>>>>>>>>>>>>  Previously jiting was only done on rank 1 of the
>>>>>>>>>>>>> mpi_comm_world. Now it is done on rank 1 of any passed group 
>>>>>>>>>>>>> communicator.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Do you mean rank 0?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, of course.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  There is no demo atm showing this but a test has been added:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   test/unit/python/jit/test_jit_with_mpi_groups.py
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here an expression, a subdomain, and a form is constructed on
>>>>>>>>>>>>> different ranks using group. It is somewhat tedious as one need to
>>>>>>>>>>>>> initialize PETSc with the same group, otherwise PETSc will 
>>>>>>>>>>>>> deadlock during
>>>>>>>>>>>>> initialization (the moment a PETSc la object is constructed).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is ok. It's arguably a design flaw that we don't make the
>>>>>>>>>>>> user handle MPI initialisation manually.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sure, it is just somewhat tedious. You cannot start your
>>>>>>>>>>> typical script with importing dolfin.
>>>>>>>>>>>
>>>>>>>>>>>  The procedure in Python for this is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Construct mpi groups using mpi4py
>>>>>>>>>>>>> 2) Initalize petscy4py using the groups
>>>>>>>>>>>>> 3) Wrap groups to petsc4py comm (dolfin only support petsc4py
>>>>>>>>>>>>> not mpi4py)
>>>>>>>>>>>>> 4) import dolfin
>>>>>>>>>>>>> 5) Do group specific stuff:
>>>>>>>>>>>>>    a) Function and forms no change needed as communicator
>>>>>>>>>>>>>       is passed via mesh
>>>>>>>>>>>>>    b) domain = CompiledSubDomain("...", mpi_comm=group_comm)
>>>>>>>>>>>>>    c) e = Expression("...", mpi_comm=group_comm)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It's not so clear whether passing the communicator means that
>>>>>>>>>>>> the Expression is only defined/available on group_comm, or if 
>>>>>>>>>>>> group_comm is
>>>>>>>>>>>> simply to control who does the JIT. Could you clarify this?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> My knowledge is not that good in MPI. I have only tried to
>>>>>>>>>>> access (and construct) the Expression on ranks included in that 
>>>>>>>>>>> group. Also
>>>>>>>>>>> when I tried construct one using a group communicator on a rank 
>>>>>>>>>>> that is not
>>>>>>>>>>> included in the group, I got an when calling MPI_size on it. There 
>>>>>>>>>>> is
>>>>>>>>>>> probably a perfectly reasonable explaination to this. 
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Could you clarify what goes on behind-the-scenes with the
>>>>>>>>>> communicator? Is it only used in a call to get the process rank? 
>>>>>>>>>> What do
>>>>>>>>>> the ranks other than zero do?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not sure what you want to know. Instead of using mpi_comm_world
>>>>>>>>> to construct meshes you use the group communicator. This communicator 
>>>>>>>>> has
>>>>>>>>> its own local group of ranks. JITing is still done on rank 0 of the 
>>>>>>>>> local
>>>>>>>>> group, which might and most often is different from rank 0 process of 
>>>>>>>>> the
>>>>>>>>> mpi_comm_word.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I just want to be clear (and have in the docstring) that
>>>>>>>>
>>>>>>>>    e = Expression("...", mpi_comm=group_comm)
>>>>>>>>
>>>>>>>> is valid only on group_comm (if this is the case), or make clear
>>>>>>>> that the communicator only determines the process that does the JIT.
>>>>>>>>
>>>>>>>
>>>>>>> I see now what you mean. I can update the docstring. As far as I
>>>>>>> understand it should be that the expression is only valid on group_comm,
>>>>>>> and that rank 0 of that group take care of the JIT.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> OK, could you make this clear in the docstring?
>>>>>>
>>>>>
>>>>> Sure.
>>>>>
>>>>>  If we required all Expressions to have a domain/mesh, as Martin
>>>>>>>> advocates, things would be clearer.
>>>>>>>>
>>>>>>>
>>>>>>> Sure, but the same question is there for the mesh too. Is it
>>>>>>> available on ranks that is not in the group?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> I think in this case it is clear - a mesh lives only on the processes
>>>>>> belonging to its communicator. The ambiguity with an Expression is that 
>>>>>> is
>>>>>> doesn't have any data that lives on processes.
>>>>>>
>>>>>
>>>>> Sure.
>>>>>
>>>>>  The group communicator works exactly like the world communicator but
>>>>>>>>> now on just a subset of the processes. There were some sharp edges 
>>>>>>>>> with
>>>>>>>>> deadlocks as a consequence, when barriers were taken on the world
>>>>>>>>> communicator. This is done by default when dolfin is imported and 
>>>>>>>>> petcs
>>>>>>>>> gets initialized with the world communicator. So we need to 
>>>>>>>>> initialized
>>>>>>>>> petsc using the group communicator. Other than that there are not real
>>>>>>>>> differences.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That doesn't sound right. PETSc initialisation does not take a
>>>>>>>> communicator. It is collective on MPI_COMM_WORLD, but each PETSc object
>>>>>>>> takes a communicator at construction, which can be something other than
>>>>>>>> MPI_COMM_WORLD or MPI_COMM_SELF.
>>>>>>>>
>>>>>>>
>>>>>>> Well, for all I know petsc can be initialized with a mpi_comm. In
>>>>>>> petsc4py that is done by:
>>>>>>>
>>>>>>>   import petsc4py
>>>>>>>   petsc4py.init(comm=group_1)
>>>>>>>   import petsc4py.PETSc as petsc
>>>>>>>
>>>>>>> It turned out that this was required for the Function constructor to
>>>>>>> not deadlock. The line:
>>>>>>>
>>>>>>>     _vector = factory.create_vector();
>>>>>>>
>>>>>>> initilizes PETSc with world_comm, which obviously deadlocks.
>>>>>>>
>>>>>>
>>>>>> There must be something else wrong. PETScInitialize does not take a
>>>>>> communicator:
>>>>>>
>>>>>>    http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/
>>>>>> PetscInitialize.html
>>>>>>
>>>>>
>>>>>
>>>>> From that web page:
>>>>>
>>>>>   Collective on MPI_COMM_WORLD or PETSC_COMM_WORLD if it has been set
>>>>>
>>>>> So setting PETSC_COMM_WORLD initializes PETSc on a subset of
>>>>> processes.
>>>>>
>>>>
>>>> It is fundamentally wrong that PETSc is 'initialised' with a
>>>> sub-communicator in DOLFIN to avoid a deadlock for the code use sketched in
>>>> this thread. Communicators belong to objects. Initialising PETSc on a
>>>> sub-communicator, it would not be possible to solve a subsequent problem on
>>>> more processes than contained in the sub-communicator.
>>>>
>>>> If the code is deadlocking, then the underlying problem should be
>>>> fixed. It's probably a function call somewhere that is erroneously not
>>>> passing the communicator (since in cases where a communicator is not
>>>> passed, we assume it to be MPI_COMM_WORLD - not ideal but keeps it simple
>>>> for most users).
>>>>
>>>
>>> Found the bugger!
>>>
>>> If the first PETScObject is created only on a group,
>>> SubSystemManager::init_petsc is called. This wont initialize PETSc if it
>>> is already initialized outside, for example with MPI_COMM_WORLD. However
>>> SLEPc will be initialized using PETSc_COMM_WORLD. If that is set to
>>> MPI_COMM_WORLD we have a dead lock. If we however, create a PETScVector on
>>> all processes before we create the group one. We are fine as SLEPc then
>>> gets initialized on all processes.
>>>
>>> So the solution is to call:
>>>
>>> SubsystemManager.init_petsc()
>>>
>>> on all processes before doing anything.
>>>
>>
>> I don't quite follow. Could you post a simple snippet that deadlocks
>> when   SubsystemManager.init_petsc() is not called, but works when it is
>> called?
>>
>> Garth
>>
>>
>>  Well, I learned a bunch of MPI related stuff here :) I will update the
>>> tests accordingly.
>>>
>>> Johan
>>>
>>>
>>>
>>>>
>>>> Garth
>>>>
>>>>
>>>>
>>>>  Also see:
>>>>>
>>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/
>>>>> manualpages/Sys/PETSC_COMM_WORLD.html
>>>>> Johan
>>>>>
>>>>>
>>>>>  Why does petsc4py want one? It doesn't make sense to initialise it
>>>>>> with a communicator - a communicator belongs to objects.
>>>>>>
>>>>>> Garth
>>>>>>
>>>>>>
>>>>>>  You might say that this could be avoided by initializing PETSc on
>>>>>>> all ranks with the world communicator before constructing a Function on 
>>>>>>> a
>>>>>>> group. However it still deadlocks during construction. Here I have just
>>>>>>> assumed it deadlock at the same line, but I need to double check this. 
>>>>>>> And
>>>>>>> when I initilized PETSc using the group communicator it just worked. So
>>>>>>> somewhere a collective call to mpi_world_comm is executed when 
>>>>>>> constructing
>>>>>>> a PETScVector.
>>>>>>>
>>>>>>> Johan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Garth
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Johan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Garth
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Please try it out and report any sharp edges. A demo would also
>>>>>>>>>>>>> be fun to include :)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We could run tests on different communicators to speed them up
>>>>>>>>>>>> on machines with high core counts!
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> True!
>>>>>>>>>>>
>>>>>>>>>>> Johan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  Garth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  Johan
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] mpi groups via petsc4py

Reply via email to