For the tests, assuming PyTest permits a 'setup' function like
unittest, would it help if MPI is initialised explicitly in the
setup
function and closed down at the end of a test suite (if possible
with
PyTest)?
Garth
>
> Jan
>
>>
>> We have had this problem in the past with the 'automatic'
>> finalisation of MPI, which is a problem if MPI is shutdown
before
>> PETSc.
>>
>> Garth
>>
>>
>> > 6. okt. 2014 12:18 skrev "Jan Blechta"
>> <[email protected]>
>> > følgende:
>> >> On Mon, 6 Oct 2014 12:07:02 +0200
>> >> Martin Sandve Alnæs <[email protected]> wrote:
>> >>
>> >> > The problem is that gc is nondeterministic and in
particular
>> not
>> >> > running with equal timing and ordering on each mpi process.
>> >> >
>> >> > We can't use the with statement to handle the scope of
every
>> >> > single dolfin object in a program.
>> >>
>> >> Most of the DOLFIN destructors are not collective. So the
>> >> moral
>> is
>> >> that
>> >> we should avoid collective destructors as possible and
>> >> document
>> it
>> >> like
>> >> it is in PETSc doc.
>> >>
>> >> Jan
>> >>
>> >> >
>> >> > We can change all file handling to use with, and require
the
>> >> > user
>> >> to
>> >> > use that in parallel.
>> >> > 6. okt. 2014 11:41 skrev "Jan Blechta"
>> >> <[email protected]>
>> >> > følgende:
>> >> >
>> >> > > On Mon, 6 Oct 2014 09:48:29 +0200
>> >> > > Martin Sandve Alnæs <[email protected]> wrote:
>> >> > >
>> >> > > > The 'fix' that's in the branch now was to trigger
python
>> >> garbage
>> >> > > > collection (suggested by Øyvind Evju) before each
test.
>> >> > > >
>> >> > > > This probably means we have a general problem in dolfin
>> with
>> >> > > > non-deterministic destruction order of objects in
>> >> > > > parallel. Any destructor that uses MPI represents a
>> >> > > > potential
>> deadlock.
>> >> > >
>> >> > > To understand the issue, is the problem that garbage
>> >> > > collection
>> >> does
>> >> > > not ensure when the object is destroyed which is the
>> >> > > problem?
>> >> > >
>> >> > > Here http://stackoverflow.com/a/5071376/1796717 the
>> distinction
>> >> > > between variable scoping and object cleanup is discussed.
>> >> Quoting it
>> >> > >
>> >> > > Deterministic cleanup happens through the with
statement.
>> >> > >
>> >> > > which might be a proper solution to the problem.
>> >> > >
>> >> > > Jan
>> >> > >
>> >> > > >
>> >> > > > On 19 September 2014 12:52, Jan Blechta
>> >> > > > <[email protected]> wrote:
>> >> > > >
>> >> > > > > On Fri, 19 Sep 2014 00:27:50 +0200
>> >> > > > > Jan Blechta <[email protected]> wrote:
>> >> > > > >
>> >> > > > > > Yes, after many trials using
>> >> > > > > >
>> >> > > > > > $ cd test/unit/io/python
>> >> > > > > > $ while true; do git clean -fdx && mpirun -n 3
xterm
>> >> > > > > > -e gdb -ex r -ex q -args python -m pytest -sv; done
>> >> > > > > > # when it hangs and you interrupt it, it asks for
>> >> > > > > > confirmation for # quitting, so you type n and
enjoy
>> >> > > > > > gdb...
>> >> > > > > >
>> >> > > > > > I've seen a situation when 2 processes deadlocked
on
>> >> > > > > > HDF5Interface::close_file() in DOLFIN with
backtrace
>> like
>> >> > > > > >
>> >> > > > > > # MPI barrier
>> >> > > > > > ...
>> >> > > > > > # MPI close
>> >> > > > > > # HDF5 lib calls
>> >> > > > > > H5FClose()
>> >> > > > > > dolfin::HDF5Interface::close_file()
>> >> > > > > > dolfin::HDF5File::close()
>> >> > > > > > dolfin::HDF5File::~HDF5File()
>> >> > > > > > dolfin::HDF5File::~HDF5File()
>> >> > > > > > # smart ptr management
>> >> > > > > > # garbage collection
>> >> > > > > >
>> >> > > > > > while 3rd process is waiting far away. Isn't it
>> >> > > > > > strange
>> >> that
>> >> > > > > > destructor is there twice in stacktrace? (The upper
>> >> > > > > > one is
>> >> on
>> >> > > > > > '}' line which I don't get.) What does it mean?
>> >> > > > >
>> >> > > > > Probably just code generation artifact - nothing
>> >> > > > > harmful, see
>> >> > > > > http://stackoverflow.com/a/15244091/1796717
>> >> > > > >
>> >> > > > > Jan
>> >> > > > >
>> >> > > > > >
>> >> > > > > > Jan
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On Thu, 18 Sep 2014 16:20:51 +0200
>> >> > > > > > Martin Sandve Alnæs <[email protected]> wrote:
>> >> > > > > >
>> >> > > > > > > I've added the mpi fixes for temppath fixture and
>> fixed
>> >> > > > > > > some other related issues while at it: When
>> >> parameterizing
>> >> > > > > > > a test that uses a temppath fixture, there is a
>> >> > > > > > > need for separate directories for each parameter
>> >> > > > > > > combo. A further improvement would be automatic
>> >> > > > > > > cleaning of old tempdirs, but I leave that for
now.
>> >> > > > > > >
>> >> > > > > > > I've pushed these changes to the branch
>> >> > > > > > > aslakbergersen/topic-change-unittest-to-pytest
>> >> > > > > > >
>> >> > > > > > > The tests still hang though, in the closing of
>> >> > > > > > > HDF5File.
>> >> > > > > > >
>> >> > > > > > > Here's now to debug if someone wants to give it a
>> shot:
>> >> > > > > > > Just run:
>> >> > > > > > > mpirun -np 3 python -m pytest -s -v
>> >> > > > > > > With gdb:
>> >> > > > > > > mpirun -np 3 xterm -e gdb --args python -m
>> >> > > > > > > pytest then enter 'r' in each of the three
xterms.
>> >> > > > > > >
>> >> > > > > > > You may have to try a couple of times to get the
>> >> > > > > > > hanging behaviour.
>> >> > > > > > >
>> >> > > > > > > Martin
>> >> > > > > > >
>> >> > > > > > > On 18 September 2014 13:23, Martin Sandve Alnæs
>> >> > > > > > > <[email protected]> wrote:
>> >> > > > > > >
>> >> > > > > > > > Good spotting both of you, thanks.
>> >> > > > > > > >
>> >> > > > > > > > Martin
>> >> > > > > > > >
>> >> > > > > > > > On 18 September 2014 13:01, Lawrence Mitchell <
>> >> > > > > > > > [email protected]> wrote:
>> >> > > > > > > >
>> >> > > > > > > >> On 18/09/14 11:42, Jan Blechta wrote:
>> >> > > > > > > >> > Some problems (when running in a clean dir)
>> >> > > > > > > >> > are
>> >> avoided
>> >> > > > > > > >> > using this (although incorrect) patch. There
>> >> > > > > > > >> > are
>> >> race
>> >> > > > > > > >> > conditions in creation of temp dir. It
should
>> >> > > > > > > >> > be
>> >> done
>> >> > > > > > > >> > using atomic operation.
>> >> > > > > > > >> >
>> >> > > > > > > >> > Jan
>> >> > > > > > > >> >
>> >> > > > > > > >> >
>> >> > > > > > > >> >
>> >> > >
>> >>
>>
==================================================================
>> >> > > > > > > >> > diff --git
a/test/unit/io/python/test_XDMF.py
>> >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py index
>> >> > > > > > > >> > 9ad65a4..31471f1 100755 ---
>> >> > > > > > > >> > a/test/unit/io/python/test_XDMF.py +++
>> >> > > > > > > >> > b/test/unit/io/python/test_XDMF.py @@ -28,8
>> +28,9
>> >> > > > > > > >> > @@ def temppath(): filedir =
>> >> > > > > > > >> > os.path.dirname(os.path.abspath(__file__))
>> >> > > > > > > >> > basename
>> >> =
>> >> > > > > > > >> > os.path.basename(__file__).replace(".py",
>> "_data")
>> >> > > > > > > >> > temppath = os.path.join(filedir, basename,
"")
>> >> > > > > > > >> > - if not os.path.exists(temppath):
>> >> > > > > > > >> > - os.mkdir(temppath)
>> >> > > > > > > >> > + if MPI.rank(mpi_comm_world()) == 0:
>> >> > > > > > > >> > + if not os.path.exists(temppath):
>> >> > > > > > > >> > + os.mkdir(temppath)
>> >> > > > > > > >> > return temppath
>> >> > > > > > > >>
>> >> > > > > > > >> There's still a race condition here because
>> >> > > > > > > >> ranks
>> >> other
>> >> > > > > > > >> than zero might try and use temppath before
it's
>> >> > > > > > > >> created. I think you want something like the
>> below:
>> >> > > > > > > >>
>> >> > > > > > > >> if MPI.rank(mpi_comm_world()) == 0:
>> >> > > > > > > >> if not os.path.exists(temppath):
>> >> > > > > > > >> os.mkdir(temppath)
>> >> > > > > > > >> MPI.barrier(mpi_comm_world())
>> >> > > > > > > >> return temppath
>> >> > > > > > > >>
>> >> > > > > > > >> If you're worried about the OS not creating
>> >> > > > > > > >> files atomically, you can always mkdir into a
>> >> > > > > > > >> tmp directory
>> >> and
>> >> > > > > > > >> then os.rename(tmp, temppath), since posix
>> >> > > > > > > >> guarantees that renames are atomic.
>> >> > > > > > > >>
>> >> > > > > > > >> Lawrence
>> >> > > > > > > >>
_______________________________________________
>> >> > > > > > > >> fenics mailing list
>> >> > > > > > > >> [email protected]
>> >> > > > > > > >>
http://fenicsproject.org/mailman/listinfo/fenics
>> >> > > > > > > >>
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > >
>> >> > > > > > _______________________________________________
>> >> > > > > > fenics mailing list
>> >> > > > > > [email protected]
>> >> > > > > > http://fenicsproject.org/mailman/listinfo/fenics
>> >> > > > >
>> >> > > > >
>> >> > >
>> >> > >
>> >>
>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics