On Mon, 6 Oct 2014 12:07:02 +0200
Martin Sandve Alnæs <[email protected]> wrote:

> The problem is that gc is nondeterministic and in particular not
> running with equal timing and ordering on each mpi process.
> 
> We can't use the with statement to handle the scope of every single
> dolfin object in a program.

Most of the DOLFIN destructors are not collective. So the moral is that
we should avoid collective destructors as possible and document it like
it is in PETSc doc.

Jan

> 
> We can change all file handling to use with, and require the user to
> use that in parallel.
>  6. okt. 2014 11:41 skrev "Jan Blechta" <[email protected]>
> følgende:
> 
> > On Mon, 6 Oct 2014 09:48:29 +0200
> > Martin Sandve Alnæs <[email protected]> wrote:
> >
> > > The 'fix' that's in the branch now was to trigger python garbage
> > > collection (suggested by Øyvind Evju) before each test.
> > >
> > > This probably means we have a general problem in dolfin with
> > > non-deterministic destruction order of objects in parallel. Any
> > > destructor that uses MPI represents a potential deadlock.
> >
> > To understand the issue, is the problem that garbage collection does
> > not ensure when the object is destroyed which is the problem?
> >
> > Here http://stackoverflow.com/a/5071376/1796717 the distinction
> > between variable scoping and object cleanup is discussed. Quoting it
> >
> >   Deterministic cleanup happens through the with statement.
> >
> > which might be a proper solution to the problem.
> >
> > Jan
> >
> > >
> > > On 19 September 2014 12:52, Jan Blechta
> > > <[email protected]> wrote:
> > >
> > > > On Fri, 19 Sep 2014 00:27:50 +0200
> > > > Jan Blechta <[email protected]> wrote:
> > > >
> > > > > Yes, after many trials using
> > > > >
> > > > > $ cd test/unit/io/python
> > > > > $ while true; do git clean -fdx && mpirun -n 3 xterm -e gdb
> > > > > -ex r -ex q -args python -m pytest -sv; done
> > > > > # when it hangs and you interrupt it, it asks for
> > > > > confirmation for # quitting, so you type n and enjoy gdb...
> > > > >
> > > > > I've seen a situation when 2 processes deadlocked on
> > > > > HDF5Interface::close_file() in DOLFIN with backtrace like
> > > > >
> > > > > # MPI barrier
> > > > > ...
> > > > > # MPI close
> > > > > # HDF5 lib calls
> > > > > H5FClose()
> > > > > dolfin::HDF5Interface::close_file()
> > > > > dolfin::HDF5File::close()
> > > > > dolfin::HDF5File::~HDF5File()
> > > > > dolfin::HDF5File::~HDF5File()
> > > > > # smart ptr management
> > > > > # garbage collection
> > > > >
> > > > > while 3rd process is waiting far away. Isn't it strange that
> > > > > destructor is there twice in stacktrace? (The upper one is on
> > > > > '}' line which I don't get.) What does it mean?
> > > >
> > > > Probably just code generation artifact - nothing harmful, see
> > > > http://stackoverflow.com/a/15244091/1796717
> > > >
> > > > Jan
> > > >
> > > > >
> > > > > Jan
> > > > >
> > > > >
> > > > > On Thu, 18 Sep 2014 16:20:51 +0200
> > > > > Martin Sandve Alnæs <[email protected]> wrote:
> > > > >
> > > > > > I've added the mpi fixes for temppath fixture and fixed
> > > > > > some other related issues while at it: When parameterizing
> > > > > > a test that uses a temppath fixture, there is a need for
> > > > > > separate directories for each parameter combo.
> > > > > > A further improvement would be automatic cleaning of old
> > > > > > tempdirs, but I leave that for now.
> > > > > >
> > > > > > I've pushed these changes to the branch
> > > > > > aslakbergersen/topic-change-unittest-to-pytest
> > > > > >
> > > > > > The tests still hang though, in the closing of HDF5File.
> > > > > >
> > > > > > Here's now to debug if someone wants to give it a shot:
> > > > > > Just run:
> > > > > >     mpirun -np 3 python -m pytest -s -v
> > > > > > With gdb:
> > > > > >     mpirun -np 3 xterm -e gdb --args python -m pytest
> > > > > > then enter 'r' in each of the three xterms.
> > > > > >
> > > > > > You may have to try a couple of times to get the hanging
> > > > > > behaviour.
> > > > > >
> > > > > > Martin
> > > > > >
> > > > > > On 18 September 2014 13:23, Martin Sandve Alnæs
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > > Good spotting both of you, thanks.
> > > > > > >
> > > > > > > Martin
> > > > > > >
> > > > > > > On 18 September 2014 13:01, Lawrence Mitchell <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > >> On 18/09/14 11:42, Jan Blechta wrote:
> > > > > > >> > Some problems (when running in a clean dir) are avoided
> > > > > > >> > using this (although incorrect) patch. There are race
> > > > > > >> > conditions in creation of temp dir. It should be done
> > > > > > >> > using atomic operation.
> > > > > > >> >
> > > > > > >> > Jan
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > ==================================================================
> > > > > > >> > diff --git a/test/unit/io/python/test_XDMF.py
> > > > > > >> > b/test/unit/io/python/test_XDMF.py index
> > > > > > >> > 9ad65a4..31471f1 100755 ---
> > > > > > >> > a/test/unit/io/python/test_XDMF.py +++
> > > > > > >> > b/test/unit/io/python/test_XDMF.py @@ -28,8 +28,9 @@
> > > > > > >> > def temppath(): filedir =
> > > > > > >> > os.path.dirname(os.path.abspath(__file__)) basename =
> > > > > > >> > os.path.basename(__file__).replace(".py", "_data")
> > > > > > >> > temppath = os.path.join(filedir, basename, "")
> > > > > > >> > -    if not os.path.exists(temppath):
> > > > > > >> > -        os.mkdir(temppath)
> > > > > > >> > +    if MPI.rank(mpi_comm_world()) == 0:
> > > > > > >> > +        if not os.path.exists(temppath):
> > > > > > >> > +            os.mkdir(temppath)
> > > > > > >> >      return temppath
> > > > > > >>
> > > > > > >> There's still a race condition here because ranks other
> > > > > > >> than zero might try and use temppath before it's
> > > > > > >> created.  I think you want something like the below:
> > > > > > >>
> > > > > > >> if MPI.rank(mpi_comm_world()) == 0:
> > > > > > >>     if not os.path.exists(temppath):
> > > > > > >>         os.mkdir(temppath)
> > > > > > >> MPI.barrier(mpi_comm_world())
> > > > > > >> return temppath
> > > > > > >>
> > > > > > >> If you're worried about the OS not creating files
> > > > > > >> atomically, you can always mkdir into a tmp directory and
> > > > > > >> then os.rename(tmp, temppath), since posix guarantees
> > > > > > >> that renames are atomic.
> > > > > > >>
> > > > > > >> Lawrence
> > > > > > >> _______________________________________________
> > > > > > >> fenics mailing list
> > > > > > >> [email protected]
> > > > > > >> http://fenicsproject.org/mailman/listinfo/fenics
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > fenics mailing list
> > > > > [email protected]
> > > > > http://fenicsproject.org/mailman/listinfo/fenics
> > > >
> > > >
> >
> >

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to