No, this is not going to be an issue if the opal_identifier_t is used correctly (aka only via the exposed accessors).
George. On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > Yeah, my fix won't work for big endian machines - this is going to be an > issue across the code base now, so we'll have to troll and fix it. I was > doing the minimal change required to fix the trunk in the meantime. > > On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > Yes. opal_process_name_t has basically no meaning by itself, it is a 64 > bits storage location used by the upper layer to save some local key that > can be later used to extract information. Calling the OPAL level compare > function might be a better fit there. > > George. > > > > On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > >> Ralph, >> >> was it really that simple ? >> >> proc_temp->super.proc_name has type opal_process_name_t : >> typedef opal_identifier_t opal_process_name_t; >> typedef uint64_t opal_identifier_t; >> >> *but* >> >> item_ptr->peer has type orte_process_name_t : >> struct orte_process_name_t { >> orte_jobid_t jobid; >> orte_vpid_t vpid; >> }; >> >> bottom line, is r32357 still valid on a big endian arch ? >> >> Cheers, >> >> Gilles >> >> >> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> I just fixed this one - all that was required was an ampersand as the >>> name was being passed into the function instead of a pointer to the name >>> >>> r32357 >>> >>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>> Rolf, >>> >>> r32353 can be seen as a suspect... >>> Even if it is correct, it might have exposed the bug discussed in #4815 >>> even more (e.g. we hit the bug 100% after the fix) >>> >>> does the attached patch to #4815 fixes the problem ? >>> >>> If yes, and if you see this issue as a showstopper, feel free to commit >>> it and drop a note to #4815 >>> ( I am afk until tomorrow) >>> >>> Cheers, >>> >>> Gilles >>> >>> Rolf vandeVaart <rvandeva...@nvidia.com> wrote: >>> >>> Just an FYI that my trunk version (r32355) does not work at all anymore >>> if I do not include "--mca coll ^ml". Here is a stack trace from the >>> ibm/pt2pt/send test running on a single node. >>> >>> >>> >>> (gdb) where >>> >>> #0 0x00007f6c0d1321d0 in ?? () >>> >>> #1 <signal handler called> >>> >>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >>> back_files=0x7f6bf3ffd6c8, >>> >>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>> "sm_payload_mem_", map_all=false) at >>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>> >>> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti >>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>> reg_data=0xba28c0) >>> >>> at >>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>> >>> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols >>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>> >>> #6 0x00007f6c0cced68f in ml_module_memory_initialization >>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>> >>> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>> >>> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>> priority=0x7fffe7991b58) at >>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>> >>> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>> >>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>> >>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, >>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>> >>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>> >>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>> >>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>> >>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>> requested=0, provided=0x7fffe79922e8) at >>> ../../ompi/runtime/ompi_mpi_init.c:918 >>> >>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>> argv=0x7fffe7992340) at pinit.c:84 >>> >>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32 >>> >>> (gdb) up >>> >>> #1 <signal handler called> >>> >>> (gdb) up >>> >>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522 >>> >>> 522 if (name1->jobid < name2->jobid) { >>> >>> (gdb) print name1 >>> >>> $1 = (const orte_process_name_t *) 0x192350001 >>> >>> (gdb) print *name1 >>> >>> Cannot access memory at address 0x192350001 >>> >>> (gdb) print name2 >>> >>> $2 = (const orte_process_name_t *) 0xbaf76c >>> >>> (gdb) print *name2 >>> >>> $3 = {jobid = 2452946945, vpid = 1} >>> >>> (gdb) >>> >>> >>> >>> >>> >>> >>> >-----Original Message----- >>> >>> >From: devel [mailto:devel-boun...@open-mpi.org >>> <devel-boun...@open-mpi.org>] On Behalf Of Gilles >>> >>> >Gouaillardet >>> >>> >Sent: Wednesday, July 30, 2014 2:16 AM >>> >>> >To: Open MPI Developers >>> >>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>> >>> > >>> >>> >George, >>> >>> > >>> >>> >#4815 is indirectly related to the move : >>> >>> > >>> >>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>> >>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t >>> >>> >(which causes a glory SIGSEGV) >>> >>> > >>> >>> >i proposed a temporary patch which is both broken and unelegant, could >>> you >>> >>> >please advise a correct solution ? >>> >>> > >>> >>> >Cheers, >>> >>> > >>> >>> >Gilles >>> >>> > >>> >>> >On 2014/07/27 7:37, George Bosilca wrote: >>> >>> >> If you have any issue with the move, I’ll be happy to help and/or >>> support >>> >>> >you on your last move toward a completely generic BTL. To facilitate >>> your >>> >>> >work I exposed a minimalistic set of OMPI information at the OPAL >>> level. Take >>> >>> >a look at opal/util/proc.h for more info, but please try not to expose >>> more. >>> >>> > >>> >>> >_______________________________________________ >>> >>> >devel mailing list >>> >>> >de...@open-mpi.org >>> >>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >Link to this post: http://www.open- >>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>> >>> >mpi.org/community/lists/devel/2014/07/15348.php >>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>> ------------------------------ >>> This email message is for the sole use of the intended recipient(s) >>> and may contain confidential information. Any unauthorized review, use, >>> disclosure or distribution is prohibited. If you are not the intended >>> recipient, please contact the sender by reply email and destroy all copies >>> of the original message. >>> ------------------------------ >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15364.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15365.php >