Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
><!-- /* Font Definitions */ @font-face         {font-family:"Cambria Math";    
>panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face       {font-family:Calibri;   
>panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, 
>li.MsoNormal, div.MsoNormal        {margin:0in;    margin-bottom:.0001pt;  
>font-size:11.0pt;       font-family:"Calibri","sans-serif";} a:link, 
>span.MsoHyperlink  {mso-style-priority:99;         color:blue;     
>text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed        
>{mso-style-priority:99;         color:purple;   text-decoration:underline;} 
>p.MsoPlainText, li.MsoPlainText, div.MsoPlainText   {mso-style-priority:99;    
>     mso-style-link:"Plain Text Char";       margin:0in;     
>margin-bottom:.0001pt;  font-size:11.0pt;       
>font-family:"Calibri","sans-serif";} span.PlainTextChar         
>{mso-style-name:"Plain Text Char";      mso-style-priority:99;  
>mso-style-link:"Plain Text";    font-family:"Calibri","sans-serif";} 
>.MsoChpDefault     {mso-style-type:export-only;    
>font-family:"Calibri","sans-serif";} @page WordSection1         {size:8.5in 
>11.0in;     margin:1.0in 127.35pt 1.0in 127.3pt;} div.WordSection1  
>{page:WordSection1;} -->
>
>Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>do not include "--mca coll ^ml".    Here is a stack trace from the 
>ibm/pt2pt/send test running on a single node.
>
> 
>
>(gdb) where
>
>#0  0x00007f6c0d1321d0 in ?? ()
>
>#1  <signal handler called>
>
>#2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>#3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>back_files=0x7f6bf3ffd6c8, 
>
>    comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>map_all=false) at 
>../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
>#4  0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti 
>(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>reg_data=0xba28c0)
>
>    at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
>#5  0x00007f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
>#6  0x00007f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
>at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
>#7  0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
>#8  0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>priority=0x7fffe7991b58) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
>#9  0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
>#10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
>#11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
>#12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
>#13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
>#14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918
>
>#15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
>at pinit.c:84
>
>#16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
>(gdb) up
>
>#1  <signal handler called>
>
>(gdb) up
>
>#2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>522           if (name1->jobid < name2->jobid) {
>
>(gdb) print name1
>
>$1 = (const orte_process_name_t *) 0x192350001
>
>(gdb) print *name1
>
>Cannot access memory at address 0x192350001
>
>(gdb) print name2
>
>$2 = (const orte_process_name_t *) 0xbaf76c
>
>(gdb) print *name2
>
>$3 = {jobid = 2452946945, vpid = 1}
>
>(gdb)
>
> 
>
> 
>
> 
>
>>-----Original Message-----
>
>>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>
>>Gouaillardet
>
>>Sent: Wednesday, July 30, 2014 2:16 AM
>
>>To: Open MPI Developers
>
>>Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
>>
>
> 
>
>>George,
>
>>
>
> 
>
>>#4815 is indirectly related to the move :
>
>>
>
> 
>
>>in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
>>we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
>>(which causes a glory SIGSEGV)
>
>>
>
> 
>
>>i proposed a temporary patch which is both broken and unelegant, could you
>
>>please advise a correct solution ?
>
>>
>
> 
>
>>Cheers,
>
>>
>
> 
>
>>Gilles
>
>>
>
> 
>
>>On 2014/07/27 7:37, George Bosilca wrote:
>
>>> If you have any issue with the move, I’ll be happy to help and/or support
>
>>you on your last move toward a completely generic BTL. To facilitate your
>
>>work I exposed a minimalistic set of OMPI information at the OPAL level. Take
>
>>a look at opal/util/proc.h for more info, but please try not to expose more.
>
>>
>
> 
>
>>_______________________________________________
>
>>devel mailing list
>
>>de...@open-mpi.org
>
>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>>Link to this post: http://www.open-
>
>>mpi.org/community/lists/devel/2014/07/15348.php
>
>This email message is for the sole use of the intended recipient(s) and may 
>contain confidential information.  Any unauthorized review, use, disclosure or 
>distribution is prohibited.  If you are not the intended recipient, please 
>contact the sender by reply email and destroy all copies of the original 
>message. 
>

Reply via email to