I would suggest not bringing it over in isolation - we planned to do an update 
that contains a lot of related changes, including the PMIx update. Probably 
need to do that pretty soon given the June target.


> On May 5, 2017, at 3:04 PM, Vallee, Geoffroy R. <valle...@ornl.gov> wrote:
> 
> Hi,
> 
> I am running some tests on a PPC platform that is using LSF and I see the 
> following problem every time I launch a job that runs on 2 nodes or more:
> 
> [crest1:49998] *** Process received signal ***
> [crest1:49998] Signal: Segmentation fault (11)
> [crest1:49998] Signal code: Address not mapped (1)
> [crest1:49998] Failing at address: 0x10061636d2d
> [crest1:49998] [ 0] [0x100000050478]
> [crest1:49998] [ 1] 
> /opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(+0x0)[0x1000009c0000]
> [crest1:49998] [ 2] 
> /opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/liblsf.so(straddr_isIPv4+0x44)[0x100000e31b64]
> [crest1:49998] [ 3] 
> /opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_pjob_array2LIST+0x114)[0x100000be79b4]
> [crest1:49998] [ 4] 
> /opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_pjob_constructList+0xfc)[0x100000becdbc]
> [crest1:49998] [ 5] 
> /opt/lsf/9.1/linux3.10-glibc2.17-ppc64le/lib/libbat.so(lsb_launch+0x184)[0x100000bed9c4]
> [crest1:49998] [ 6] 
> /ccs/home/gvh/install/crest/ompi3_llvm/lib/openmpi/mca_plm_lsf.so(+0x2660)[0x100000992660]
> [crest1:49998] [ 7] 
> /ccs/home/gvh/install/crest/ompi3_llvm/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x940)[0x1000001f7730]
> [crest1:49998] [ 8] 
> /ccs/home/gvh/install/crest/ompi3_llvm/bin/mpiexec[0x100013e4]
> [crest1:49998] [ 9] 
> /ccs/home/gvh/install/crest/ompi3_llvm/bin/mpiexec[0x10000f10]
> [crest1:49998] [10] /lib64/power8/libc.so.6(+0x24580)[0x1000004f4580]
> [crest1:49998] [11] 
> /lib64/power8/libc.so.6(__libc_start_main+0xc4)[0x1000004f4774]
> [crest1:49998] *** End of error message ***
> 
> I do not experience that problem with master and the only difference about 
> the LSF support between master and the v3 branch is:
> 
> https://github.com/open-mpi/ompi/commit/92c996487c589ef8558a087ce2a9923dacdf0b99
>  
> <https://github.com/open-mpi/ompi/commit/92c996487c589ef8558a087ce2a9923dacdf0b99>
> 
> If I can confirm that this change fixes the problem with the v3 branch, would 
> you guys accept to bring it into the v3 branch?
> 
> Thanks,
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to