On Wed, 3 Dec 2014 21:07:37 +0000
David Holloway <[email protected]> wrote:

> Hi Jan,
> 
> My computations aren't parallel, and I don't believe I have OpenMPI
> installed - I can't find the file SubsystemManager.cpp mentioned in
> the bitbucket issue.

You must have OpenMPI, DOLFIN from PPA is compiled against it. It looks
like OpenMPI issue - you should refer to OpenMPI mailing list. This

> Unable to start a daemon on the local node (-128)

is the keyword. Nevertheless can you try

 $ ldd /usr/lib/libdolfin.so | grep -i mpi
 $ locate libmpi.so

 $ # Which of following fails?
 $ mpirun echo hello world
 $ python -c"import dolfin"
 $ mpirun -n 2 python -c"import dolfin"

There may be some conflicts it libmpi version but I'm not sure how to
debug it.

Jan

> 
> I also get these 'MPI' crashes when I run the gmsh executable from
> the linux shell command line. I also get very similar when trying to
> use my own .py (which calls dolfin) from a different directory, which
> doesn't call the 'dolfin_parameters.xml' shown below.
> 
> I have done the following to uninstall/reinstall Fenics:
> 
> sudo apt-get remove fenics*
> sudo apt-get -purge autoremove fenics
> sudo apt-get update
> sudo add-apt-repository ppa:fenics-packages/fenics
> sudo apt-get update
> sudo apt-get install fenics
> sudo apt-get dist-upgrade
> 
> (I also did this process with ipython, mayavi2 and gmsh.)
> 
> Thank you,
> 
> David
> 
> 
> What is your OpenMPI version? It could be
> https://bitbucket.org/fenics-project/dolfin/issue/384
> 
> Jan
> 
> 
> On Wed, 3 Dec 2014 16:27:25 +0000
> David Holloway
> <[email protected]<mailto:[email protected]>> wrote:
> 
> > Hi Jan,
> >
> > Thank you - here's a screen dump from trying to run a Fenics
> > demo .py
> >
> > In [1]: run d1_p2D.py
> > Reading DOLFIN parameters from file "dolfin_parameters.xml".
> > [ubuntu:03039] *** Process received signal *** [ubuntu:03039]
> > Signal: Floating point exception (8) [ubuntu:03039] Signal code:
> > Integer divide-by-zero (1) [ubuntu:03039] Failing at address:
> > 0xb74e7da0 [ubuntu:03039] [ 0] [0xb77bb40c] [ubuntu:03039] [ 1]
> > /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2cda0)
> > [0xb74e7da0] [ubuntu:03039]
> > [ 2] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2e71c) [0xb74e971c]
> > [ubuntu:03039] [ 3] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2ea8b)
> > [0xb74e9a8b] [ubuntu:03039]
> > [ 4] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x98f6) [0xb74c48f6]
> > [ubuntu:03039] [ 5]
> > /usr/lib/i386-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1c6)
> > [0xb74c58ec] [ubuntu:03039]
> > [ 6] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7b1)
> > [0xb770c881] [ubuntu:03039]
> > [ 7] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2445)
> > [0xb7797445] [ubuntu:03039]
> > [ 8] /usr/lib/libopen-rte.so.4(orte_init+0x1cf) [0xb76e1b3f]
> > [ubuntu:03039] [ 9] /usr/lib/libopen-rte.so.4(orte_daemon+0x256)
> > [0xb76fe1c6] [ubuntu:03039] [10] orted() [0x80485b3] [ubuntu:03039]
> > [11] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)
> > [0xb7515a83] [ubuntu:03039] [12] orted() [0x80485f8] [ubuntu:03039]
> > *** End of error message *** [ubuntu:03035] [[INVALID],INVALID]
> > ORTE_ERROR_LOG: Unable to start a daemon on the local node in file
> > ess_singleton_module.c at line 343 [ubuntu:03035]
> > [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the
> > local node in file ess_singleton_module.c at line 140
> > [ubuntu:03035] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start
> > a daemon on the local node in file runtime/orte_init.c at line 128
> > ----------------------------------------------------------------------
> > ---- It looks like orte_init failed for some reason; your parallel
> > process is likely to abort.  There are many reasons that a parallel
> > process can fail during orte_init; some of which are due to
> > configuration or environment problems.  This failure appears to be
> > an internal failure; here's some additional information (which may
> > only be relevant to an Open MPI developer):
> >
> >   orte_ess_set_name failed
> >   --> Returned value Unable to start a daemon on the local node
> > (-128) instead of ORTE_SUCCESS
> > ----------------------------------------------------------------------
> > ----
> > ----------------------------------------------------------------------
> > ---- It looks like MPI_INIT failed for some reason; your parallel
> > process is likely to abort.  There are many reasons that a parallel
> > process can fail during MPI_INIT; some of which are due to
> > configuration or environment problems.  This failure appears to be
> > an internal failure; here's some additional information (which may
> > only be relevant to an Open MPI developer):
> >
> >   ompi_mpi_init: orte_init failed
> >   --> Returned "Unable to start a daemon on the local node" (-128)
> > instead of "Success" (0)
> > ----------------------------------------------------------------------
> > ---- [ubuntu:3035] *** An error occurred in MPI_Init_thread
> > [ubuntu:3035]
> > *** on a NULL communicator [ubuntu:3035] *** Unknown error
> > [ubuntu:3035] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> > ----------------------------------------------------------------------
> > ---- An MPI process is aborting at a time when it cannot guarantee
> > that all of its peer processes in the job will be killed properly.
> > You should double check that everything has shut down cleanly.
> >
> >   Reason:     Before MPI_INIT completed
> >   Local host: ubuntu
> >   PID:        3035
> > ----------------------------------------------------------------------
> 

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Reply via email to