Anders Logg wrote:
> On Wed, Oct 07, 2009 at 05:39:05PM +0200, Patrick Riesen wrote:
>> hi, i catched up with dolfin 0.9.3 on my linux workstation. install &
>> compile went fine and running demos in serial seems to be ok as well.
>> i was trying to run the demos in parallel but i get errors with openmpi
>> as follows:
>> (this occured running any dem with mpirun -np xy ./demo where xy is
>> larger than 1, it did not occur for -np 1):
>>
>> ------------
>> {process output.....}
>>
>> then suddenly
>>
>> [vierzack01:12050] *** An error occurred in MPI_Barrier
>> [vierzack01:12049] *** An error occurred in MPI_Barrier
>> [vierzack01:12049] *** on communicator MPI_COMM_WORLD
>> [vierzack01:12049] *** MPI_ERR_COMM: invalid communicator
>> [vierzack01:12049] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [vierzack01:12050] *** on communicator MPI_COMM_WORLD
>> [vierzack01:12050] *** MPI_ERR_COMM: invalid communicator
>> [vierzack01:12050] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [vierzack01:12049] *** Process received signal ***
>> [vierzack01:12049] Signal: Segmentation fault (11)
>> [vierzack01:12049] Signal code: Address not mapped (1)
>> [vierzack01:12049] Failing at address: 0x4
>> [vierzack01:12050] *** Process received signal ***
>> [vierzack01:12050] Signal: Segmentation fault (11)
>> [vierzack01:12050] Signal code: Address not mapped (1)
>> [vierzack01:12050] Failing at address: 0x4
>> [vierzack01:12049] [ 0] /lib/libpthread.so.0 [0x7f0fd3be6410]
>> [vierzack01:12049] [ 1]
>> /home/priesen/num/openmpi/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x34)
>> [0x7f0fd475c1d4]
>> [vierzack01:12049] [ 2]
>> /home/priesen/num/openmpi/lib/libmpi.so.0(ompi_mpi_finalize+0x11b)
>> [0x7f0fd48a8b0b]
>> [vierzack01:12049] [ 3]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0(_ZN6dolfin17SubSystemsManager12finalize_mpiEv+0x35)
>> [0x7f0fd7bbfb15]
>> [vierzack01:12049] [ 4]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0(_ZN6dolfin17SubSystemsManagerD1Ev+0xe)
>> [0x7f0fd7bbfb2e]
>> [vierzack01:12049] [ 5] /lib/libc.so.6(__cxa_finalize+0x6c) [0x7f0fd39cee0c]
>> [vierzack01:12049] [ 6]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0 [0x7f0fd7aa65d3]
>> [vierzack01:12049] *** End of error message ***
>> [vierzack01:12050] [ 0] /lib/libpthread.so.0 [0x7fd707916410]
>> [vierzack01:12050] [ 1]
>> /home/priesen/num/openmpi/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x34)
>> [0x7fd70848c1d4]
>> [vierzack01:12050] [ 2]
>> /home/priesen/num/openmpi/lib/libmpi.so.0(ompi_mpi_finalize+0x11b)
>> [0x7fd7085d8b0b]
>> [vierzack01:12050] [ 3]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0(_ZN6dolfin17SubSystemsManager12finalize_mpiEv+0x35)
>> [0x7fd70b8efb15]
>> [vierzack01:12050] [ 4]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0(_ZN6dolfin17SubSystemsManagerD1Ev+0xe)
>> [0x7fd70b8efb2e]
>> [vierzack01:12050] [ 5] /lib/libc.so.6(__cxa_finalize+0x6c) [0x7fd7076fee0c]
>> [vierzack01:12050] [ 6]
>> /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0 [0x7fd70b7d65d3]
>> [vierzack01:12050] *** End of error message ***
>> mpirun noticed that job rank 0 with PID 12049 on node vierzack01 exited
>> on signal 15 (Terminated).
>> 1 additional process aborted (not shown)
>> ------------
>>
>> Is this a openmpi error?
>> Is there a specific version of openmpi required for dolfin?
>> mine is 1.2.8 and it worked up to dolfin 0.9.2
> 
> No idea what goes wrong. My version of OpenMPI is 1.3.2-3ubuntu1.

Hi, so i installed openmpi-1.3.3 and it's still the same problem.
i tried to catch the error, here is a backtrace from attaching gdb via 
petsc and having dolfin in debug mode:

#0  0x00007ff27dd7c07b in raise () from /lib/libc.so.6
#1  0x00007ff27dd7d84e in abort () from /lib/libc.so.6
#2  0x00007ff27f926ea8 in Petsc_MPI_AbortOnError (comm=0x7fff8a060448,
     flag=0x7fff8a060434) at init.c:142
#3  0x00007ff27ec44e0f in ompi_errhandler_invoke ()
    from /home/priesen/num/openmpi-1.3.3/lib/libmpi.so.0
#4  0x00007ff281d83714 in ParMETIS_V3_PartMeshKway ()
    from /scratch-second/priesen/FEniCS/build/lib/libdolfin.so.0
#5  0x00007ff281c8ba0b in dolfin::MeshPartitioning::compute_partition (
     cell_partiti...@0x7fff8a0a8900, mesh_da...@0x7fff8a0a8970)
     at dolfin/mesh/MeshPartitioning.cpp:588
#6  0x00007ff281c8bc41 in dolfin::MeshPartitioning::partition (
     me...@0x7fff8a0a8b50, mesh_da...@0x7fff8a0a8970)
     at dolfin/mesh/MeshPartitioning.cpp:74
#7  0x00007ff281c6fb42 in Mesh (this=0x7fff8a0a8b50, 
filena...@0x7fff8a0a9cd0)
     at dolfin/mesh/Mesh.cpp:67
#8  0x0000000000429b60 in main ()


frame 5 seems to interesting, , so :

(gdb) f 5
#5  0x00007ff281c8ba0b in dolfin::MeshPartitioning::compute_partition (
     cell_partiti...@0x7fff8a0a8900, mesh_da...@0x7fff8a0a8970)
     at dolfin/mesh/MeshPartitioning.cpp:588
588                                &edgecut, part, &(*comm));


and then the lines:

(gdb) l
583       // Call ParMETIS to partition mesh
584       ParMETIS_V3_PartMeshKway(elmdist, eptr, eind,
585                                elmwgt, &wgtflag, &numflag, &ncon,
586                                &ncommonnodes, &nparts,
587                                tpwgts, ubvec, options,
588                                &edgecut, part, &(*comm));
589       info("Partitioned mesh, edge cut is %d.", edgecut);
590
591       // Copy mesh_data
592       cell_partition.clear();


when i check the input arguments, there is elmwgt which has no address:

(gdb) p elmwgt
$4 = (int *) 0x0
(gdb) p *elmwgt
Cannot access memory at address 0x0


so, here i do not know any further, please tell me what i could possibly 
else check to determine what goes wrong or maybe you know it already.

regards,
patrick

> 
> DOLFIN wasn't parallel before 0.9.3 so I'm not sure what you mean by
> it working up to 0.9.2.
> 
> --
> Anders
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> FEniCS-users mailing list
> fenics-us...@fenics.org
> http://fenics.org/mailman/listinfo/fenics-users

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@fenics.org
http://www.fenics.org/mailman/listinfo/dolfin-dev

Reply via email to