Hi Jan, 

You were right, it was an OpenMPI problem: e.g. "mpirun" failed. I tried and 
failed on a new installation of OpenMPI (only succeeded on a too-old version, 
1.6). 

The fix: a full new installation of Ubuntu 14.04, 64-bit. The Ubuntu upgrade 
from version 12 to version 14 (both 32-bit) broke/didn't upgrade some critical 
libraries (e.g. MPI). With a fresh Ubuntu, the PPA Fenics installed fine. 

Thank you very much for your tips - that was definitely critical in pointing me 
in the right direction! It's a big relief to have my Fenics code running again!

Cheers, 

David

-----Original Message-----
From: Jan Blechta [mailto:[email protected]] 
Sent: Thursday, December 04, 2014 4:54 AM
To: David Holloway
Cc: [email protected]
Subject: Re: [FEniCS-support] dolfin/fenics in ubuntu 14

On Wed, 3 Dec 2014 21:07:37 +0000
David Holloway <[email protected]> wrote:

> Hi Jan,
> 
> My computations aren't parallel, and I don't believe I have OpenMPI 
> installed - I can't find the file SubsystemManager.cpp mentioned in 
> the bitbucket issue.

You must have OpenMPI, DOLFIN from PPA is compiled against it. It looks like 
OpenMPI issue - you should refer to OpenMPI mailing list. This

> Unable to start a daemon on the local node (-128)

is the keyword. Nevertheless can you try

 $ ldd /usr/lib/libdolfin.so | grep -i mpi  $ locate libmpi.so

 $ # Which of following fails?
 $ mpirun echo hello world
 $ python -c"import dolfin"
 $ mpirun -n 2 python -c"import dolfin"

There may be some conflicts it libmpi version but I'm not sure how to debug it.

Jan

> 
> I also get these 'MPI' crashes when I run the gmsh executable from the 
> linux shell command line. I also get very similar when trying to use 
> my own .py (which calls dolfin) from a different directory, which 
> doesn't call the 'dolfin_parameters.xml' shown below.
> 
> I have done the following to uninstall/reinstall Fenics:
> 
> sudo apt-get remove fenics*
> sudo apt-get -purge autoremove fenics
> sudo apt-get update
> sudo add-apt-repository ppa:fenics-packages/fenics sudo apt-get update 
> sudo apt-get install fenics sudo apt-get dist-upgrade
> 
> (I also did this process with ipython, mayavi2 and gmsh.)
> 
> Thank you,
> 
> David
> 
> 
> What is your OpenMPI version? It could be
> https://bitbucket.org/fenics-project/dolfin/issue/384
> 
> Jan
> 
> 
> On Wed, 3 Dec 2014 16:27:25 +0000
> David Holloway
> <[email protected]<mailto:[email protected]>> wrote:
> 
> > Hi Jan,
> >
> > Thank you - here's a screen dump from trying to run a Fenics demo 
> > .py
> >
> > In [1]: run d1_p2D.py
> > Reading DOLFIN parameters from file "dolfin_parameters.xml".
> > [ubuntu:03039] *** Process received signal *** [ubuntu:03039]
> > Signal: Floating point exception (8) [ubuntu:03039] Signal code:
> > Integer divide-by-zero (1) [ubuntu:03039] Failing at address:
> > 0xb74e7da0 [ubuntu:03039] [ 0] [0xb77bb40c] [ubuntu:03039] [ 1]
> > /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2cda0)
> > [0xb74e7da0] [ubuntu:03039]
> > [ 2] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2e71c) [0xb74e971c] 
> > [ubuntu:03039] [ 3] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2ea8b)
> > [0xb74e9a8b] [ubuntu:03039]
> > [ 4] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x98f6) [0xb74c48f6] 
> > [ubuntu:03039] [ 5]
> > /usr/lib/i386-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1c6)
> > [0xb74c58ec] [ubuntu:03039]
> > [ 6] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7b1)
> > [0xb770c881] [ubuntu:03039]
> > [ 7] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2445)
> > [0xb7797445] [ubuntu:03039]
> > [ 8] /usr/lib/libopen-rte.so.4(orte_init+0x1cf) [0xb76e1b3f] 
> > [ubuntu:03039] [ 9] /usr/lib/libopen-rte.so.4(orte_daemon+0x256)
> > [0xb76fe1c6] [ubuntu:03039] [10] orted() [0x80485b3] [ubuntu:03039] 
> > [11] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)
> > [0xb7515a83] [ubuntu:03039] [12] orted() [0x80485f8] [ubuntu:03039]
> > *** End of error message *** [ubuntu:03035] [[INVALID],INVALID]
> > ORTE_ERROR_LOG: Unable to start a daemon on the local node in file 
> > ess_singleton_module.c at line 343 [ubuntu:03035] 
> > [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the 
> > local node in file ess_singleton_module.c at line 140 [ubuntu:03035] 
> > [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the 
> > local node in file runtime/orte_init.c at line 128
> > --------------------------------------------------------------------
> > --
> > ---- It looks like orte_init failed for some reason; your parallel 
> > process is likely to abort.  There are many reasons that a parallel 
> > process can fail during orte_init; some of which are due to 
> > configuration or environment problems.  This failure appears to be 
> > an internal failure; here's some additional information (which may 
> > only be relevant to an Open MPI developer):
> >
> >   orte_ess_set_name failed
> >   --> Returned value Unable to start a daemon on the local node
> > (-128) instead of ORTE_SUCCESS
> > --------------------------------------------------------------------
> > --
> > ----
> > --------------------------------------------------------------------
> > --
> > ---- It looks like MPI_INIT failed for some reason; your parallel 
> > process is likely to abort.  There are many reasons that a parallel 
> > process can fail during MPI_INIT; some of which are due to 
> > configuration or environment problems.  This failure appears to be 
> > an internal failure; here's some additional information (which may 
> > only be relevant to an Open MPI developer):
> >
> >   ompi_mpi_init: orte_init failed
> >   --> Returned "Unable to start a daemon on the local node" (-128) 
> > instead of "Success" (0)
> > --------------------------------------------------------------------
> > --
> > ---- [ubuntu:3035] *** An error occurred in MPI_Init_thread 
> > [ubuntu:3035]
> > *** on a NULL communicator [ubuntu:3035] *** Unknown error 
> > [ubuntu:3035] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> > --------------------------------------------------------------------
> > --
> > ---- An MPI process is aborting at a time when it cannot guarantee 
> > that all of its peer processes in the job will be killed properly.
> > You should double check that everything has shut down cleanly.
> >
> >   Reason:     Before MPI_INIT completed
> >   Local host: ubuntu
> >   PID:        3035
> > --------------------------------------------------------------------
> > --
> 

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Reply via email to