tried with vader - same crash

*14:14:22* [vegas12:32068] 7 more processes have sent help message
help-mca-var.txt / deprecated-mca-env*14:14:22* [vegas12:32068] Set
MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages*14:14:22* +
LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*14:14:22*
+ OMPI_MCA_scoll_fca_enable=1*14:14:22* +
OMPI_MCA_scoll_fca_np=0*14:14:22* + OMPI_MCA_pml=ob1*14:14:22* +
OMPI_MCA_btl=vader,self,openib*14:14:22* +
OMPI_MCA_spml=yoda*14:14:22* +
OMPI_MCA_memheap_mr_interleave_factor=8*14:14:22* +
OMPI_MCA_memheap=ptmalloc*14:14:22* +
OMPI_MCA_btl_openib_if_include=mlx4_0:1*14:14:22* +
OMPI_MCA_rmaps_base_dist_hca=mlx4_0*14:14:22* +
OMPI_MCA_memheap_base_hca_name=mlx4_0*14:14:22* +
OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*14:14:22* +
MXM_RDMA_PORTS=mlx4_0:1*14:14:22* +
SHMEM_SYMMETRIC_HEAP_SIZE=1024M*14:14:22* + timeout -s SIGSEGV 3m
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
-np 8 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*14:14:22*
[vegas12][[4652,1],1][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22*
[vegas12][[4652,1],0][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22*
--------------------------------------------------------------------------*14:14:22*
WARNING: The openib BTL was directed to use "eager RDMA" for
short*14:14:22* messages, but the openib BTL was compiled with
progress threads*14:14:22* support.  Short eager RDMA is not yet
supported with progress threads;*14:14:22* its use has been disabled
in this job.*14:14:22* *14:14:22* This is a warning only; you job will
attempt to continue.*14:14:22*
--------------------------------------------------------------------------*14:14:22*
[vegas12][[4652,1],5][btl_openib_component.c:909:device_destruct]
Failed to cancel OpenIB progress thread*14:14:22* [vegas12:32108] ***
Process received signal ****14:14:22* [vegas12:32108] Signal:
Segmentation fault (11)*14:14:22* [vegas12:32108] Signal code: Address
not mapped (1)*14:14:22* [vegas12:32108] Failing at address:
(nil)*14:14:22* [vegas12:32108] [ 0]
/lib64/libpthread.so.0[0x3937c0f500]*14:14:22* [vegas12:32108] [ 1]
/usr/lib64/libibverbs.so.1(ibv_destroy_comp_channel+0x16)[0x3b7760bf46]*14:14:22*
[vegas12:32108] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xdf02)[0x7ffff3fc1f02]*14:14:22*
[vegas12:32108] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0xf161)[0x7ffff3fc3161]*14:14:22*
[vegas12:32108] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_btl_openib.so(+0x12ab1)[0x7ffff3fc6ab1]*14:14:22*
[vegas12:32108] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_btl_base_select+0x117)[0x7ffff7a29807]*14:14:22*
[vegas12:32108] [ 6]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff41ed7e2]*14:14:22*
[vegas12:32108] [ 7]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_bml_base_init+0x99)[0x7ffff7a29009]*14:14:22*
[vegas12:32108] [ 8]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/openmpi/mca_pml_ob1.so(+0x58b5)[0x7ffff35848b5]*14:14:22*
[vegas12:32108] [ 9]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(mca_pml_base_select+0x1e0)[0x7ffff7a3c590]*14:14:22*
[vegas12:32108] [10]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/libmpi.so.0(ompi_mpi_init+0x455)[0x7ffff7a06bf5]*14:14:22*
[vegas12:32108] [11]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(oshmem_shmem_init+0xfd)[0x7ffff7ca66dd]*14:14:22*
[vegas12:32108] [12]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib/liboshmem.so.0(shmem_init+0x28)[0x7ffff7ca9328]*14:14:22*
[vegas12:32108] [13]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x40077d]*14:14:22*
[vegas12:32108] [14]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*14:14:22*
[vegas12:32108] [15]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem[0x4006a9]*14:14:22*
[vegas12:32108] *** End of error message ****14:14:22* [vegas12:32112]
*** Process received signal ****14:14:22* [vegas12:32112] Signal:
Segmentation fault (11)*14:14:*



On Wed, Jun 25, 2014 at 9:11 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Mike,
>
> could you try again with
>
> OMPI_MCA_btl=vader,self,openib
>
> it seems the sm module causes a hang
> (which later causes the timeout sending a SIGSEGV)
>
> Cheers,
>
> Gilles
>
> On 2014/06/25 14:22, Mike Dubman wrote:
> > Hi,
> > The following commit broke trunk in jenkins:
> >
> >>>> Per the OMPI developer conference, remove the last vestiges of
> > OMPI_USE_PROGRESS_THREADS
> >
> > *22:15:09* +
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
> > + OMPI_MCA_scoll_fca_enable=1*22:15:09* +
> > OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
> > OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
> > OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
> > OMPI_MCA_memheap=ptmalloc*22:15:09* +
> > OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
> > OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
> > OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
> > OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
> > MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
> > SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
> >
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
> > -np 8
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
> > [vegas12:08101] *** Process received signal ****22:15:09*
> > [vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
> > [vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
> > [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
> >
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15055.php
>

Reply via email to