On Wed, 3 Dec 2014 21:07:37 +0000 David Holloway <[email protected]> wrote:
> Hi Jan, > > My computations aren't parallel, and I don't believe I have OpenMPI > installed - I can't find the file SubsystemManager.cpp mentioned in > the bitbucket issue. You must have OpenMPI, DOLFIN from PPA is compiled against it. It looks like OpenMPI issue - you should refer to OpenMPI mailing list. This > Unable to start a daemon on the local node (-128) is the keyword. Nevertheless can you try $ ldd /usr/lib/libdolfin.so | grep -i mpi $ locate libmpi.so $ # Which of following fails? $ mpirun echo hello world $ python -c"import dolfin" $ mpirun -n 2 python -c"import dolfin" There may be some conflicts it libmpi version but I'm not sure how to debug it. Jan > > I also get these 'MPI' crashes when I run the gmsh executable from > the linux shell command line. I also get very similar when trying to > use my own .py (which calls dolfin) from a different directory, which > doesn't call the 'dolfin_parameters.xml' shown below. > > I have done the following to uninstall/reinstall Fenics: > > sudo apt-get remove fenics* > sudo apt-get -purge autoremove fenics > sudo apt-get update > sudo add-apt-repository ppa:fenics-packages/fenics > sudo apt-get update > sudo apt-get install fenics > sudo apt-get dist-upgrade > > (I also did this process with ipython, mayavi2 and gmsh.) > > Thank you, > > David > > > What is your OpenMPI version? It could be > https://bitbucket.org/fenics-project/dolfin/issue/384 > > Jan > > > On Wed, 3 Dec 2014 16:27:25 +0000 > David Holloway > <[email protected]<mailto:[email protected]>> wrote: > > > Hi Jan, > > > > Thank you - here's a screen dump from trying to run a Fenics > > demo .py > > > > In [1]: run d1_p2D.py > > Reading DOLFIN parameters from file "dolfin_parameters.xml". > > [ubuntu:03039] *** Process received signal *** [ubuntu:03039] > > Signal: Floating point exception (8) [ubuntu:03039] Signal code: > > Integer divide-by-zero (1) [ubuntu:03039] Failing at address: > > 0xb74e7da0 [ubuntu:03039] [ 0] [0xb77bb40c] [ubuntu:03039] [ 1] > > /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2cda0) > > [0xb74e7da0] [ubuntu:03039] > > [ 2] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2e71c) [0xb74e971c] > > [ubuntu:03039] [ 3] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x2ea8b) > > [0xb74e9a8b] [ubuntu:03039] > > [ 4] /usr/lib/i386-linux-gnu/libhwloc.so.5(+0x98f6) [0xb74c48f6] > > [ubuntu:03039] [ 5] > > /usr/lib/i386-linux-gnu/libhwloc.so.5(hwloc_topology_load+0x1c6) > > [0xb74c58ec] [ubuntu:03039] > > [ 6] /usr/lib/libopen-rte.so.4(orte_odls_base_open+0x7b1) > > [0xb770c881] [ubuntu:03039] > > [ 7] /usr/lib/openmpi/lib/openmpi/mca_ess_hnp.so(+0x2445) > > [0xb7797445] [ubuntu:03039] > > [ 8] /usr/lib/libopen-rte.so.4(orte_init+0x1cf) [0xb76e1b3f] > > [ubuntu:03039] [ 9] /usr/lib/libopen-rte.so.4(orte_daemon+0x256) > > [0xb76fe1c6] [ubuntu:03039] [10] orted() [0x80485b3] [ubuntu:03039] > > [11] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) > > [0xb7515a83] [ubuntu:03039] [12] orted() [0x80485f8] [ubuntu:03039] > > *** End of error message *** [ubuntu:03035] [[INVALID],INVALID] > > ORTE_ERROR_LOG: Unable to start a daemon on the local node in file > > ess_singleton_module.c at line 343 [ubuntu:03035] > > [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the > > local node in file ess_singleton_module.c at line 140 > > [ubuntu:03035] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start > > a daemon on the local node in file runtime/orte_init.c at line 128 > > ---------------------------------------------------------------------- > > ---- It looks like orte_init failed for some reason; your parallel > > process is likely to abort. There are many reasons that a parallel > > process can fail during orte_init; some of which are due to > > configuration or environment problems. This failure appears to be > > an internal failure; here's some additional information (which may > > only be relevant to an Open MPI developer): > > > > orte_ess_set_name failed > > --> Returned value Unable to start a daemon on the local node > > (-128) instead of ORTE_SUCCESS > > ---------------------------------------------------------------------- > > ---- > > ---------------------------------------------------------------------- > > ---- It looks like MPI_INIT failed for some reason; your parallel > > process is likely to abort. There are many reasons that a parallel > > process can fail during MPI_INIT; some of which are due to > > configuration or environment problems. This failure appears to be > > an internal failure; here's some additional information (which may > > only be relevant to an Open MPI developer): > > > > ompi_mpi_init: orte_init failed > > --> Returned "Unable to start a daemon on the local node" (-128) > > instead of "Success" (0) > > ---------------------------------------------------------------------- > > ---- [ubuntu:3035] *** An error occurred in MPI_Init_thread > > [ubuntu:3035] > > *** on a NULL communicator [ubuntu:3035] *** Unknown error > > [ubuntu:3035] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > > ---------------------------------------------------------------------- > > ---- An MPI process is aborting at a time when it cannot guarantee > > that all of its peer processes in the job will be killed properly. > > You should double check that everything has shut down cleanly. > > > > Reason: Before MPI_INIT completed > > Local host: ubuntu > > PID: 3035 > > ---------------------------------------------------------------------- > _______________________________________________ fenics-support mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics-support
