Hmm, maybe this has nothing to do with big-endian.
Below is a backtrace from ring_c on an IA64 platform (definitely
little-endian) that looks very similar to me.

It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
So, I wonder if that might be related.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
[altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [altix:20418] *** Process received signal ***
[altix:20418] Signal: Segmentation fault (11)
[altix:20418] Signal code: Invalid permissions (2)
[altix:20418] Failing at address: 0x16
[altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
COLL-ML [altix:20419] *** Process received signal ***
[altix:20419] Signal: Segmentation fault (11)
[altix:20419] Signal code: Invalid permissions (2)
[altix:20419] Failing at address: 0x16
[altix:20418] [ 0] [0xa000000000010800]
[altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x200000000051b2a0]
[altix:20418] [altix:20419] [ 0] [0xa000000000010800]
[altix:20419] [ 1] [ 2]
/lib/libc.so.6.1(strlen-0x92e930)[0x200000000051b2a0]
[altix:20419] [ 2]
/lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x20000000004b15d0]
[altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x20000000004b2860]
[altix:20419] [ 4]
/lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x20000000004aaaa0]
[altix:20419] [ 5]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc5a70)[0x2000000001e55a70]
[altix:20419] [ 6]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0xc84a0)[0x2000000001e584a0]
[altix:20419] [ 7]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc+0x100f520)[0x2000000001e59110]
[altix:20419] [ 8]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block+0xf6e940)[0x2000000001db8540]
[altix:20419] [ 9]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x10130)[0x2000000001da0130]
[altix:20419] [10]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(+0x19970)[0x2000000001da9970]
[altix:20419] [11]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query+0xf6d6b0)[0x2000000001db5830]
[altix:20419] [12]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fbd0)[0x200000000028fbd0]
[altix:20419] [13]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22fac0)[0x200000000028fac0]
[altix:20419] [14]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22f7e0)[0x200000000028f7e0]
[altix:20419] [15]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(+0x22eac0)[0x200000000028eac0]
[altix:20419] [16]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0xbcbb90)[0x200000000027e080]
[altix:20419] [17]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(ompi_mpi_init-0xd38e70)[0x2000000000110db0]
[altix:20419] [18]
/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/libmpi.so.0(MPI_Init-0xc8bf40)[0x20000000001bdcf0]
[altix:20419] [19] examples/ring_c[0x4000000000000c00]
[altix:20419] [20]
/lib/libc.so.6.1(__libc_start_main-0x9f56b0)[0x2000000000454590]
[altix:20419] [21] examples/ring_c[0x4000000000000a20]
[altix:20419] *** End of error message ***
/lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x20000000004b15d0]
[altix:20418] [ 3] /lib/libc.so.6.1(+0x82860)[0x20000000004b2860]
[altix:20418] [ 4]
/lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x20000000004aaaa0]




On Thu, Jul 31, 2014 at 11:47 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Gilles's findings are consistent with mine which showed the SEGVs to be in
> the coll/ml code.
> I've built with --enable-debug and so below is a backtrace (well, two
> actually) that might be helpful.
> Unfortunately the output of the two ranks did get slightly entangled.
>
> -Paul
>
> $ ../INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c'
> [bd-login][[43502,1],0][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
> COLL-ML [bd-login:09106] *** Process received signal ***
> [bd-login][[43502,1],1][/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
> COLL-ML [bd-login:09107] *** Process received signal ***
> [bd-login:09107] Signal: Segmentation fault (11)
> [bd-login:09107] Signal code: Address not mapped (1)
> [bd-login:09107] Failing at address: 0x10
> [bd-login:09107] [ 0] [bd-login:09106] Signal: Segmentation fault (11)
> [bd-login:09106] Signal code: Address not mapped (1)
> [bd-login:09106] Failing at address: 0x10
> [bd-login:09106] [ 0] [0xfffa96c0418]
> [bd-login:09106] [ 1] [0xfff8f580418]
> [bd-login:09107] [ 1] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968]
> [bd-login:09107] [ 2] /lib64/libc.so.6(_IO_vfprintf-0x157168)[0x80c9b5b968]
> [bd-login:09106] [ 2] /lib64/libc.so.6[0x80c9b600b4]
> [bd-login:09106] [ 3] /lib64/libc.so.6[0x80c9b600b4]
> [bd-login:09107] [ 3] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0]
> [bd-login:09107] [ 4] /lib64/libc.so.6(_IO_vfprintf-0x157010)[0x80c9b5bac0]
> [bd-login:09106] [ 4]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfffa8296580]
> [bd-login:09106] [ 5]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfffa8297604]
> [bd-login:09106] [ 6]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfffa829784c]
> [bd-login:09106] [ 7]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfffa8250d4c]
> [bd-login:09106]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x66580)[0xfff8e156580]
> [bd-login:09107] [ 5]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x67604)[0xfff8e157604]
> [bd-login:09107] [ 6]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_lmngr_alloc-0x1af04)[0xfff8e15784c]
> [bd-login:09107] [ 7]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_allocate_block-0x607b4)[0xfff8e110d4c]
> [bd-login:09107] [ 8]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfff8e1065e4]
> [bd-login:09107] [ 9]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfff8e10a7d8]
> [bd-login:09107] [10]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfff8e10f970]
> [bd-login:09107] [11] [ 8]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x165e4)[0xfffa82465e4]
> [bd-login:09106] [ 9]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(+0x1a7d8)[0xfffa824a7d8]
> [bd-login:09106] [10]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_comm_query-0x61b50)[0xfffa824f970]
> [bd-login:09106] [11]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfff8f4b5ba0]
> [bd-login:09107] [12]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfff8f4b5b14]
> [bd-login:09107] [13]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165ba0)[0xfffa95f5ba0]
> [bd-login:09106] [12]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x165b14)[0xfffa95f5b14]
> [bd-login:09106] [13]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfffa95f59a8]
> [bd-login:09106] [14]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1659a8)[0xfff8f4b59a8]
> [bd-login:09107] [14]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfffa95f57ac]
> [bd-login:09106] [15]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(+0x1657ac)[0xfff8f4b57ac]
> [bd-login:09107] [15]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfff8f4ae3ec]
> [bd-login:09107]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(mca_coll_base_comm_select-0x9b89c)[0xfffa95ee3ec]
> [bd-login:09106] [16] [16]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfff8f401408]
> [bd-login:09107] [17]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(ompi_mpi_init-0x13f790)[0xfffa9541408]
> [bd-login:09106] [17]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(MPI_Init-0xf28d4)[0xfffa9591c74]
> [bd-login:09106] [18] examples/ring_c[0x1000099c]
> [bd-login:09106] [19]
> /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/INST/lib/libmpi.so.0(MPI_Init-0xf28d4)[0xfff8f451c74]
> [bd-login:09107] [18] examples/ring_c[0x1000099c]
> [bd-login:09107] [19] /lib64/libc.so.6[0x80c9b2bcd8]
> [bd-login:09107] [20] /lib64/libc.so.6[0x80c9b2bcd8]
> [bd-login:09106] [20]
> /lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0]
> [bd-login:09107] *** End of error message ***
> /lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0]
> [bd-login:09106] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 0 on node bd-login exited on
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
>
>
>
>
>
>
> On Thu, Jul 31, 2014 at 11:39 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul and Ralph,
>>
>> for what it's worth :
>>
>> a) i faced the very same issue on my (sloooow) qemu emulated ppc64 vm
>> b) i was able to run very basic programs when passing --mca coll ^ml to
>> mpirun
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/01 12:30, Ralph Castain wrote:
>>
>> Yes, I fear this will require some effort to chase all the breakage down 
>> given that (to my knowledge, at least) we lack PPC machines in the devel 
>> group.
>>
>>
>> On Jul 31, 2014, at 5:46 PM, Paul Hargrove <phhargr...@lbl.gov> 
>> <phhargr...@lbl.gov> wrote:
>>
>>
>>  On the path to verifying George's atomics patch, I have started just by 
>> verifying that I can still build the UNPATCHED trunk on each of the 
>> platforms I listed.
>>
>> I have tried two PPC64/Linux systems so far and am seeing the same problem 
>> on both.  Though I can pass "make check" both platforms SEGV on
>>    mpirun -mca btl sm,self -np 2 examples/ring_c
>>
>> Is this the expected state of the trunk on big-endian systems?
>> I am thinking in particular of 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which 
>> Ralph wrote:
>>
>>  Yeah, my fix won't work for big endian machines - this is going to be an 
>> issue across the
>> code base now, so we'll have to troll and fix it. I was doing the minimal 
>> change required to
>> fix the trunk in the meantime.
>>
>>  If this big-endian failure is not known/expected let me know and I'll 
>> provide details.
>> Since testing George's patch only requires "make check" I can proceed with 
>> that regardless.
>>
>> -Paul
>>
>>
>> On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca <bosi...@icl.utk.edu> 
>> <bosi...@icl.utk.edu> wrote:
>> Awesome, thanks Paul. When the results will be in we will fix whatever is 
>> needed for these less common architectures.
>>
>>   George.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove <phhargr...@lbl.gov> 
>> <phhargr...@lbl.gov> wrote:
>>
>>
>> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove <phhargr...@lbl.gov> 
>> <phhargr...@lbl.gov> wrote:
>>
>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca <bosi...@icl.utk.edu> 
>> <bosi...@icl.utk.edu> wrote:
>> Paul, I know you have a pretty diverse range computers. Can you try to 
>> compile and run a "make check" with the following patch?
>>
>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset 
>> of those is still supported).
>> The ARM and MIPS system are emulators and take forever to build OMPI.
>> However, I am not even sure how soon I'll get to start this testing.
>>
>>
>> Add SPARC (v8plus and v9) to that list.
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
>>
>>
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15414.php
>>
>>
>>
>> _______________________________________________
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15425.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15436.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to