Yeah, my fix won't work for big endian machines - this is going to be an issue 
across the code base now, so we'll have to troll and fix it. I was doing the 
minimal change required to fix the trunk in the meantime.

On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 bits 
> storage location used by the upper layer to save some local key that can be 
> later used to extract information. Calling the OPAL level compare function 
> might be a better fit there.
> 
>   George.
> 
> 
> 
> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> Ralph,
> 
> was it really that simple ?
> 
> proc_temp->super.proc_name has type opal_process_name_t :
> typedef opal_identifier_t opal_process_name_t;
> typedef uint64_t opal_identifier_t;
> 
> *but*
> 
> item_ptr->peer has type orte_process_name_t :
> struct orte_process_name_t {
>    orte_jobid_t jobid;
>    orte_vpid_t vpid;
> };
> 
> bottom line, is r32357 still valid on a big endian arch ?
> 
> Cheers,
> 
> Gilles
> 
> 
> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote:
> I just fixed this one - all that was required was an ampersand as the name 
> was being passed into the function instead of a pointer to the name
> 
> r32357
> 
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
> <gilles.gouaillar...@gmail.com> wrote:
> 
>> Rolf,
>> 
>> r32353 can be seen as a suspect...
>> Even if it is correct, it might have exposed the bug discussed in #4815 even 
>> more (e.g. we hit the bug 100% after the fix)
>> 
>> does the attached patch to #4815 fixes the problem ?
>> 
>> If yes, and if you see this issue as a showstopper, feel free to commit it 
>> and drop a note to #4815
>> ( I am afk until tomorrow)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>> Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>> do not include "--mca coll ^ml".    Here is a stack trace from the 
>> ibm/pt2pt/send test running on a single node.
>> 
>>  
>> 
>> (gdb) where
>> 
>> #0  0x00007f6c0d1321d0 in ?? ()
>> 
>> #1  <signal handler called>
>> 
>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>> 
>> #3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>> back_files=0x7f6bf3ffd6c8,
>> 
>>     comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>> map_all=false) at 
>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>> 
>> #4  0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti 
>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>> reg_data=0xba28c0)
>> 
>>     at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>> 
>> #5  0x00007f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>> 
>> #6  0x00007f6c0cced68f in ml_module_memory_initialization 
>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>> 
>> #7  0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>> 
>> #8  0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>> priority=0x7fffe7991b58) at 
>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>> 
>> #9  0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>> 
>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>> 
>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>> 
>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>> 
>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>> 
>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>> 
>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>> 
>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>> 
>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>> requested=0, provided=0x7fffe79922e8) at 
>> ../../ompi/runtime/ompi_mpi_init.c:918
>> 
>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, 
>> argv=0x7fffe7992340) at pinit.c:84
>> 
>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>> 
>> (gdb) up
>> 
>> #1  <signal handler called>
>> 
>> (gdb) up
>> 
>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>> 
>> 522           if (name1->jobid < name2->jobid) {
>> 
>> (gdb) print name1
>> 
>> $1 = (const orte_process_name_t *) 0x192350001
>> 
>> (gdb) print *name1
>> 
>> Cannot access memory at address 0x192350001
>> 
>> (gdb) print name2
>> 
>> $2 = (const orte_process_name_t *) 0xbaf76c
>> 
>> (gdb) print *name2
>> 
>> $3 = {jobid = 2452946945, vpid = 1}
>> 
>> (gdb)
>> 
>>  
>>  
>> 
>>  
>> 
>> >-----Original Message-----
>> 
>> >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>> 
>> >Gouaillardet
>> 
>> >Sent: Wednesday, July 30, 2014 2:16 AM
>> 
>> >To: Open MPI Developers
>> 
>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>> 
>> > 
>> 
>> >George,
>> 
>> > 
>> 
>> >#4815 is indirectly related to the move :
>> 
>> > 
>> 
>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>> 
>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>> 
>> >(which causes a glory SIGSEGV)
>> 
>> > 
>> 
>> >i proposed a temporary patch which is both broken and unelegant, could you
>> 
>> >please advise a correct solution ?
>> 
>> > 
>> 
>> >Cheers,
>> 
>> > 
>> 
>> >Gilles
>> 
>> > 
>> 
>> >On 2014/07/27 7:37, George Bosilca wrote:
>> 
>> >> If you have any issue with the move, I’ll be happy to help and/or support
>> 
>> >you on your last move toward a completely generic BTL. To facilitate your
>> 
>> >work I exposed a minimalistic set of OMPI information at the OPAL level. 
>> >Take
>> 
>> >a look at opal/util/proc.h for more info, but please try not to expose more.
>> 
>> > 
>> 
>> >_______________________________________________
>> 
>> >devel mailing list
>> 
>> >de...@open-mpi.org
>> 
>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> >Link to this post: http://www.open-
>> 
>> >mpi.org/community/lists/devel/2014/07/15348.php
>> 
>> This email message is for the sole use of the intended recipient(s) and may 
>> contain confidential information.  Any unauthorized review, use, disclosure 
>> or distribution is prohibited.  If you are not the intended recipient, 
>> please contact the sender by reply email and destroy all copies of the 
>> original message.
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php

Reply via email to