All the platforms that failed over the weekend have passed today. -Paul
On Mon, Feb 10, 2014 at 2:34 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > The fastest of my systems that failed over the weekend (a ppc64) has > completed tests successfully. > I will report on the ppc32 and SPARC results when they have all passed or > failed. > > -Paul > > > On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Tarball is now posted >> >> On Feb 10, 2014, at 1:31 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >> Generating it now - sorry for my lack of response, my OMPI email was down >> for some reason. I can now receive it, but still haven't gotten the backlog >> from the down period. >> >> >> On Feb 10, 2014, at 1:23 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> Ralph, >> >> If you give me a heads-up when this makes it into a tarball, I will >> retest my failing ppc and sparc platforms. >> >> -Paul >> >> >> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart >> <rvandeva...@nvidia.com>wrote: >> >>> I have tracked this down. There is a missing commit that affects >>> ompi_mpi_init.c causing it to initialize bml twice. >>> >>> Ralph, can you apply r30310 to 1.7? >>> >>> >>> >>> Thanks, >>> >>> Rolf >>> >>> >>> >>> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf >>> vandeVaart >>> *Sent:* Monday, February 10, 2014 12:29 PM >>> *To:* Open MPI Developers >>> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test >>> >>> >>> >>> I have seen this same issue although my core dump is a little bit >>> different. I am running with tcp,self. The first entry in the list of >>> BTLs is garbage, but then there is tcp and self in the list. Strange. >>> This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. >>> >>> >>> >>> Program terminated with signal 11, Segmentation fault. >>> >>> #0 0x00007fb6dec981d0 in ?? () >>> >>> Missing separate debuginfos, use: debuginfo-install >>> glibc-2.12-1.107.el6_4.5.x86_64 >>> >>> (gdb) where >>> >>> #0 0x00007fb6dec981d0 in ?? () >>> >>> #1 <signal handler called> >>> >>> #2 0x00007fb6e82fff38 in main_arena () from /lib64/libc.so.6 >>> >>> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >>> procs=0x2061440, reachable=0x7fff80487b40) >>> >>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >>> >>> #4 0x00007fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, >>> nprocs=2) >>> >>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 >>> >>> #5 0x00007fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, >>> requested=0, provided=0x7fff80487cc8) >>> >>> at ../../ompi/runtime/ompi_mpi_init.c:776 >>> >>> #6 0x00007fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, >>> argv=0x7fff80487d80) at pinit.c:84 >>> >>> #7 0x0000000000401c56 in main (argc=1, argv=0x7fff80488158) at >>> MPI_Isend_ator_c.c:143 >>> >>> (gdb) >>> >>> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >>> procs=0x2061440, reachable=0x7fff80487b40) >>> >>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >>> >>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, >>> btl_endpoints, reachable); >>> >>> (gdb) print *btl >>> >>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, >>> btl_rndv_eager_limit = 140423556235000, >>> >>> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = >>> 140423556235016, >>> >>> btl_rdma_pipeline_frag_size = 140423556235016, >>> btl_min_rdma_pipeline_size = 140423556235032, >>> >>> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = >>> 3895459624, btl_flags = 32694, >>> >>> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 >>> <main_arena+184>, >>> >>> btl_del_procs = 0x7fb6e82fff38 <main_arena+184>, btl_register = >>> 0x7fb6e82fff48 <main_arena+200>, >>> >>> btl_finalize = 0x7fb6e82fff48 <main_arena+200>, btl_alloc = >>> 0x7fb6e82fff58 <main_arena+216>, >>> >>> btl_free = 0x7fb6e82fff58 <main_arena+216>, btl_prepare_src = >>> 0x7fb6e82fff68 <main_arena+232>, >>> >>> btl_prepare_dst = 0x7fb6e82fff68 <main_arena+232>, btl_send = >>> 0x7fb6e82fff78 <main_arena+248>, >>> >>> btl_sendi = 0x7fb6e82fff78 <main_arena+248>, btl_put = 0x7fb6e82fff88 >>> <main_arena+264>, >>> >>> btl_get = 0x7fb6e82fff88 <main_arena+264>, btl_dump = 0x7fb6e82fff98 >>> <main_arena+280>, >>> >>> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 >>> <main_arena+296>, >>> >>> btl_ft_event = 0x7fb6e82fffa8 <main_arena+296>} >>> >>> (gdb) >>> >>> >>> >>> >>> >>> *From:* devel >>> [mailto:devel-boun...@open-mpi.org<devel-boun...@open-mpi.org>] >>> *On Behalf Of *Mike Dubman >>> *Sent:* Monday, February 10, 2014 4:23 AM >>> *To:* Open MPI Developers >>> *Subject:* [OMPI devel] 1.7.5 fails on simple test >>> >>> >>> >>> >>> >>> >>> >>> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun >>> -np 8 -mca pml ob1 -mca btl self,tcp >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi* >>> >>> *[vegas12:12724] *** Process received signal **** >>> >>> *[vegas12:12724] Signal: Segmentation fault (11)* >>> >>> *[vegas12:12724] Signal code: (128)* >>> >>> *[vegas12:12724] Failing at address: (nil)* >>> >>> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]* >>> >>> *[vegas12:12724] [ 1] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]* >>> >>> *[vegas12:12724] [ 2] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]* >>> >>> *[vegas12:12724] [ 3] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]* >>> >>> *[vegas12:12724] [ 4] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]* >>> >>> *[vegas12:12724] [ 5] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]* >>> >>> *[vegas12:12724] [ 6] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]* >>> >>> *[vegas12:12724] [ 7] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]* >>> >>> *[vegas12:12724] [ 8] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]* >>> >>> *[vegas12:12724] [ 9] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]* >>> >>> *[vegas12:12724] [10] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]* >>> >>> *[vegas12:12724] [11] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]* >>> >>> *[vegas12:12724] [12] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]* >>> >>> *[vegas12:12724] [13] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]* >>> >>> *[vegas12:12724] *** End of error message **** >>> >>> *[vegas12:12731] *** Process received signal **** >>> >>> *[vegas12:12731] Signal: Segmentation fault (11)* >>> >>> *[vegas12:12731] Signal code: (128)* >>> >>> *[vegas12:12731] Failing at address: (nil)* >>> >>> *[vegas12:12731] [ 0] /lib64/libpthread.so.0[0x3937c0f500]* >>> >>> *[vegas12:12731] [ 1] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]* >>> >>> *[vegas12:12731] [ 2] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]* >>> >>> *[vegas12:12731] [ 3] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]* >>> >>> *[vegas12:12731] [ 4] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]* >>> >>> *[vegas12:12731] [ 5] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]* >>> >>> *[vegas12:12731] [ 6] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]* >>> >>> *[vegas12:12731] [ 7] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]* >>> >>> *[vegas12:12731] [ 8] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]* >>> >>> *[vegas12:12731] [ 9] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]* >>> >>> *[vegas12:12731] [10] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]* >>> >>> *[vegas12:12731] [11] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]* >>> >>> *[vegas12:12731] [12] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]* >>> >>> *[vegas12:12731] [13] >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]* >>> >>> *[vegas12:12731] *** End of error message **** >>> >>> *--------------------------------------------------------------------------* >>> >>> *mpirun noticed that process rank 0 with PID 12724 on node vegas12 exited >>> on signal 11 (Segmentation fault).* >>> >>> *--------------------------------------------------------------------------* >>> >>> *jenkins@vegas12 ~* >>> >>> >>> >>> >>> >>> ------------------------------ >>> >>> This email message is for the sole use of the intended recipient(s) and >>> may contain confidential information. Any unauthorized review, use, >>> disclosure or distribution is prohibited. If you are not the intended >>> recipient, please contact the sender by reply email and destroy all copies >>> of the original message. >>> ------------------------------ >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900