Rolf, r32353 can be seen as a suspect... Even if it is correct, it might have exposed the bug discussed in #4815 even more (e.g. we hit the bug 100% after the fix)
does the attached patch to #4815 fixes the problem ? If yes, and if you see this issue as a showstopper, feel free to commit it and drop a note to #4815 ( I am afk until tomorrow) Cheers, Gilles Rolf vandeVaart <rvandeva...@nvidia.com> wrote: ><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; >panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; >panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, >li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; >font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, >span.MsoHyperlink {mso-style-priority:99; color:blue; >text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed >{mso-style-priority:99; color:purple; text-decoration:underline;} >p.MsoPlainText, li.MsoPlainText, div.MsoPlainText {mso-style-priority:99; > mso-style-link:"Plain Text Char"; margin:0in; >margin-bottom:.0001pt; font-size:11.0pt; >font-family:"Calibri","sans-serif";} span.PlainTextChar >{mso-style-name:"Plain Text Char"; mso-style-priority:99; >mso-style-link:"Plain Text"; font-family:"Calibri","sans-serif";} >.MsoChpDefault {mso-style-type:export-only; >font-family:"Calibri","sans-serif";} @page WordSection1 {size:8.5in >11.0in; margin:1.0in 127.35pt 1.0in 127.3pt;} div.WordSection1 >{page:WordSection1;} --> > >Just an FYI that my trunk version (r32355) does not work at all anymore if I >do not include "--mca coll ^ml". Here is a stack trace from the >ibm/pt2pt/send test running on a single node. > > > >(gdb) where > >#0 0x00007f6c0d1321d0 in ?? () > >#1 <signal handler called> > >#2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > >#3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >back_files=0x7f6bf3ffd6c8, > > comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", >map_all=false) at >../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 > >#4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti >(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >reg_data=0xba28c0) > > at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 > >#5 0x00007f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 > >#6 0x00007f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) >at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 > >#7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 > >#8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >priority=0x7fffe7991b58) at >../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 > >#9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 > >#10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, >priority=0x7fffe7991b58, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 > >#11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, >component=0x7f6c0cf50940, module=0x7fffe7991b90) > > at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 > >#12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 > >#13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 > >#14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918 > >#15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) >at pinit.c:84 > >#16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 > >(gdb) up > >#1 <signal handler called> > >(gdb) up > >#2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', >name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 > >522 if (name1->jobid < name2->jobid) { > >(gdb) print name1 > >$1 = (const orte_process_name_t *) 0x192350001 > >(gdb) print *name1 > >Cannot access memory at address 0x192350001 > >(gdb) print name2 > >$2 = (const orte_process_name_t *) 0xbaf76c > >(gdb) print *name2 > >$3 = {jobid = 2452946945, vpid = 1} > >(gdb) > > > > > > > >>-----Original Message----- > >>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles > >>Gouaillardet > >>Sent: Wednesday, July 30, 2014 2:16 AM > >>To: Open MPI Developers > >>Subject: Re: [OMPI devel] trunk compilation errors in jenkins > >> > > > >>George, > >> > > > >>#4815 is indirectly related to the move : > >> > > > >>in bcol/basesmuma, we used to compare ompi_process_name_t, and now > >>we (try to) compare an ompi_process_name_t and an opal_process_name_t > >>(which causes a glory SIGSEGV) > >> > > > >>i proposed a temporary patch which is both broken and unelegant, could you > >>please advise a correct solution ? > >> > > > >>Cheers, > >> > > > >>Gilles > >> > > > >>On 2014/07/27 7:37, George Bosilca wrote: > >>> If you have any issue with the move, I’ll be happy to help and/or support > >>you on your last move toward a completely generic BTL. To facilitate your > >>work I exposed a minimalistic set of OMPI information at the OPAL level. Take > >>a look at opal/util/proc.h for more info, but please try not to expose more. > >> > > > >>_______________________________________________ > >>devel mailing list > >>de...@open-mpi.org > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>Link to this post: http://www.open- > >>mpi.org/community/lists/devel/2014/07/15348.php > >This email message is for the sole use of the intended recipient(s) and may >contain confidential information. Any unauthorized review, use, disclosure or >distribution is prohibited. If you are not the intended recipient, please >contact the sender by reply email and destroy all copies of the original >message. >