Ralph and George, here is attached a patch that fixes the heterogeneous support without the abstraction violation.
Cheers, Gilles On 2014/08/06 9:40, Gilles Gouaillardet wrote: > hummm > > i intentionally did not swap the two 32 bits (!) > > from the top level, what we have is : > > typedef struct { > union { > uint64_t opal; > struct { > uint32_t jobid; > uint32_t vpid; > } orte; > } meta_process_name_t; > > OPAL is agnostic about jobid and vpid. > jobid and vpid are set in ORTE/MPI and OPAL is used only > to transport the 64 bits > /* opal_process_name_t and orte_process_name_t are often casted into each > other */ > at ORTE/MPI level, jobid and vpid are set individually > /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */ > this is why everything works fine on homogeneous clusters regardless > endianness. > > now in heterogeneous cluster, thing get a bit trickier ... > > i was initially unhappy with my commit and i think i found out why : > this is an abstraction violation ! > the two 32 bits are not swapped by OPAL because this is what is expected by > the ORTE/OMPI. > > now i d like to suggest the following lightweight approach : > > at OPAL, use #if protected htonll/ntohll > (e.g. swap the two 32bits) > > do the trick at the ORTE level : > > simply replace > > struct orte_process_name_t { > orte_jobid_t jobid; > orte_vpid_t vpid; > }; > > with > > #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) > struct orte_process_name_t { > orte_vpid_t vpid; > orte_jobid_t jobid; > }; > #else > struct orte_process_name_t { > orte_jobid_t jobid; > orte_vpid_t vpid; > }; > #endif > > > so we keep OPAL agnostic about how the uint64_t is really used at the upper > level. > an other option is to make OPAL aware of jobid and vpid but this is a bit > more heavyweight imho. > > i'll try this today and make sure it works. > > any thoughts ? > > Cheers, > > Gilles > > > On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> Ah yes, so it is - sorry I missed that last test :-/ >> >> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >> The code committed by Gilles is correctly protected for big endian ( >> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely >> pointing out that I think he should also swap the 2 32 bits in his >> implementation. >> >> George. >> >> >> >> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>> >>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Hmmm...wouldn't that then require that you know (a) the other side is >>>> little endian, and (b) that you are on a big endian? Otherwise, you wind up >>>> with the same issue in reverse, yes? >>>> >>> This is similar to the 32 bits ntohl that we are using in other parts of >>> the project. Any little endian participant will do the conversion, while >>> every big endian participant will use an empty macro instead. >>> >>> >>>> In the ORTE methods, we explicitly set the fields (e.g., jobid = >>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it by >>>> location instead of named fields - perhaps we should do that instead? >>>> >>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the >>> notion of jobid and vpid) this approach will become possible. >>> >>> >>> Not proposing that at all so long as the other method will work without >>> knowing the other side's endianness. Sounds like your approach should work >>> fine as long as Gilles adds a #if so big endian defines the macro away >>> >>> >>> George. >>> >>> >>> >>>> >>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>> >>>> Technically speaking, converting a 64 bits to a big endian >>>> representation requires the swap of the 2 32 bits parts. So the correct >>>> approach would have been: >>>> uint64_t htonll(uint64_t v) >>>> { >>>> return ((((uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32)); >>>> } >>>> >>>> George. >>>> >>>> >>>> >>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> FWIW: that's exactly how we do it in ORTE >>>>> >>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet < >>>>> gilles.gouaillar...@iferc.org> wrote: >>>>> >>>>> George, >>>>> >>>>> i confirm there was a problem when running on an heterogeneous cluster, >>>>> this is now fixed in r32425. >>>>> >>>>> i am not convinced i chose the most elegant way to achieve the desired >>>>> result ... >>>>> could you please double check this commit ? >>>>> >>>>> Thanks, >>>>> >>>>> Gilles >>>>> >>>>> On 2014/08/02 0:14, George Bosilca wrote: >>>>> >>>>> Gilles, >>>>> >>>>> The design of the BTL move was to let the opal_process_name_t be agnostic >>>>> to what is stored inside, and all accesses should be done through the >>>>> provided accessors. Thus, big endian or little endian doesn't make a >>>>> difference, as long as everything goes through the accessors. >>>>> >>>>> I'm skeptical about the support of heterogeneous environments in the >>>>> current code, so I didn't pay much attention to handling the case in the >>>>> TCP BTL. But in case we do care it is enough to make the 2 macros point >>>>> to something meaningful instead of being empty (bswap_64 or something). >>>>> >>>>> George. >>>>> >>>>> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet >>>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> wrote: >>>>> >>>>> >>>>> George and Ralph, >>>>> >>>>> i am very confused whether there is an issue or not. >>>>> >>>>> >>>>> anyway, today Paul and i ran basic tests on big endian machines and did >>>>> not face any issue related to big endianness. >>>>> >>>>> so i made my homework, digged into the code, and basically, >>>>> opal_process_name_t is used as an orte_process_name_t. >>>>> for example, in ompi_proc_init : >>>>> >>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->jobid = >>>>> OMPI_PROC_MY_NAME->jobid; >>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->vpid = i; >>>>> >>>>> and with >>>>> >>>>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a)) >>>>> >>>>> so as long as an opal_process_name_t is used as an orte_process_name_t, >>>>> there is no problem, >>>>> regardless the endianness of the homogenous cluster we are running on. >>>>> >>>>> for the sake of readability (and for being pedantic too ;-) ) in r32357, >>>>> &proc_temp->super.proc_name >>>>> could be replaced with >>>>> OMPI_CAST_ORTE_NAME(&proc_temp->super.proc_name) >>>>> >>>>> >>>>> >>>>> That being said, in btl/tcp, i noticed : >>>>> >>>>> in mca_btl_tcp_component_recv_handler : >>>>> >>>>> opal_process_name_t guid; >>>>> [...] >>>>> /* recv the process identifier */ >>>>> retval = recv(sd, (char *)&guid, sizeof(guid), 0); >>>>> if(retval != sizeof(guid)) { >>>>> CLOSE_THE_SOCKET(sd); >>>>> return; >>>>> } >>>>> OPAL_PROCESS_NAME_NTOH(guid); >>>>> >>>>> and in mca_btl_tcp_endpoint_send_connect_ack : >>>>> >>>>> /* send process identifier to remote endpoint */ >>>>> opal_process_name_t guid = btl_proc->proc_opal->proc_name; >>>>> >>>>> OPAL_PROCESS_NAME_HTON(guid); >>>>> if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid, >>>>> sizeof(guid)) != >>>>> >>>>> and with >>>>> >>>>> #define OPAL_PROCESS_NAME_NTOH(guid) >>>>> #define OPAL_PROCESS_NAME_HTON(guid) >>>>> >>>>> >>>>> i had no time yet to test yet, but for now, i can only suspect : >>>>> - there will be an issue with the tcp btl on an heterogeneous cluster >>>>> - for this case, the fix is to have a different version of the >>>>> OPAL_PROCESS_NAME_xTOy >>>>> on little endian arch if heterogeneous mode is supported. >>>>> >>>>> >>>>> >>>>> does that make sense ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> On 2014/07/31 1:29, George Bosilca wrote: >>>>> >>>>> The underlying structure changed, so a little bit of fiddling is normal. >>>>> Instead of using a field in the ompi_proc_t you are now using a field down >>>>> in opal_proc_t, a field that simply cannot have the same type as before >>>>> (orte_process_name_t). >>>>> >>>>> George. >>>>> >>>>> >>>>> >>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> >>>>> <r...@open-mpi.org> wrote: >>>>> >>>>> >>>>> George - my point was that we regularly tested using the method in that >>>>> routine, and now we have to do something a little different. So it is an >>>>> "issue" in that we have to make changes across the code base to ensure we >>>>> do things the "new" way, that's all >>>>> >>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> >>>>> <bosi...@icl.utk.edu> wrote: >>>>> >>>>> No, this is not going to be an issue if the opal_identifier_t is used >>>>> correctly (aka only via the exposed accessors). >>>>> >>>>> George. >>>>> >>>>> >>>>> >>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> >>>>> <r...@open-mpi.org> wrote: >>>>> >>>>> >>>>> Yeah, my fix won't work for big endian machines - this is going to be an >>>>> issue across the code base now, so we'll have to troll and fix it. I was >>>>> doing the minimal change required to fix the trunk in the meantime. >>>>> >>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>>> <bosi...@icl.utk.edu> wrote: >>>>> >>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 >>>>> bits storage location used by the upper layer to save some local key that >>>>> can be later used to extract information. Calling the OPAL level compare >>>>> function might be a better fit there. >>>>> >>>>> George. >>>>> >>>>> >>>>> >>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet >>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>> >>>>> >>>>> Ralph, >>>>> >>>>> was it really that simple ? >>>>> >>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>> typedef opal_identifier_t opal_process_name_t; >>>>> typedef uint64_t opal_identifier_t; >>>>> >>>>> *but* >>>>> >>>>> item_ptr->peer has type orte_process_name_t : >>>>> struct orte_process_name_t { >>>>> orte_jobid_t jobid; >>>>> orte_vpid_t vpid; >>>>> }; >>>>> >>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>> <r...@open-mpi.org> >>>>> wrote: >>>>> >>>>> >>>>> I just fixed this one - all that was required was an ampersand as the >>>>> name was being passed into the function instead of a pointer to the name >>>>> >>>>> r32357 >>>>> >>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET >>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>> >>>>> Rolf, >>>>> >>>>> r32353 can be seen as a suspect... >>>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>>> even more (e.g. we hit the bug 100% after the fix) >>>>> >>>>> does the attached patch to #4815 fixes the problem ? >>>>> >>>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>>> it and drop a note to #4815 >>>>> ( I am afk until tomorrow) >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>>>> >>>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>>> if I do not include "--mca coll ^ml". Here is a stack trace from the >>>>> ibm/pt2pt/send test running on a single node. >>>>> >>>>> >>>>> >>>>> (gdb) where >>>>> >>>>> #0 0x00007f6c0d1321d0 in ?? () >>>>> >>>>> #1 <signal handler called> >>>>> >>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>> ../../orte/util/name_fns.c:522 >>>>> >>>>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, >>>>> back_files=0x7f6bf3ffd6c8, >>>>> >>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>> "sm_payload_mem_", map_all=false) at >>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>> >>>>> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>> reg_data=0xba28c0) >>>>> >>>>> at >>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>> >>>>> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols >>>>> (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>> >>>>> #6 0x00007f6c0cced68f in ml_module_memory_initialization >>>>> (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>> >>>>> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>> >>>>> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>> priority=0x7fffe7991b58) at >>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>> >>>>> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>> >>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>> >>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>> >>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>> >>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>> comm=0x6037a0) at >>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>> >>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>> >>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>> requested=0, provided=0x7fffe79922e8) at >>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>> >>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>> >>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>> send.c:32 >>>>> >>>>> (gdb) up >>>>> >>>>> #1 <signal handler called> >>>>> >>>>> (gdb) up >>>>> >>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>> ../../orte/util/name_fns.c:522 >>>>> >>>>> 522 if (name1->jobid < name2->jobid) { >>>>> >>>>> (gdb) print name1 >>>>> >>>>> $1 = (const orte_process_name_t *) 0x192350001 >>>>> >>>>> (gdb) print *name1 >>>>> >>>>> Cannot access memory at address 0x192350001 >>>>> >>>>> (gdb) print name2 >>>>> >>>>> $2 = (const orte_process_name_t *) 0xbaf76c >>>>> >>>>> (gdb) print *name2 >>>>> >>>>> $3 = {jobid = 2452946945, vpid = 1} >>>>> >>>>> (gdb) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: devel [mailto:devel-boun...@open-mpi.org >>>>> <devel-boun...@open-mpi.org> >>>>> >>>>> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of >>>>> Gilles >>>>> >>>>> >>>>> Gouaillardet >>>>> Sent: Wednesday, July 30, 2014 2:16 AM >>>>> To: Open MPI Developers >>>>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>>>> George, >>>>> #4815 is indirectly related to the move : >>>>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>>>> we (try to) compare an ompi_process_name_t and an opal_process_name_t >>>>> (which causes a glory SIGSEGV) >>>>> i proposed a temporary patch which is both broken and unelegant, could >>>>> >>>>> you >>>>> >>>>> >>>>> please advise a correct solution ? >>>>> Cheers, >>>>> Gilles >>>>> On 2014/07/27 7:37, George Bosilca wrote: >>>>> >>>>> If you have any issue with the move, I'll be happy to help and/or >>>>> >>>>> support >>>>> >>>>> >>>>> you on your last move toward a completely generic BTL. To facilitate >>>>> >>>>> your >>>>> >>>>> >>>>> work I exposed a minimalistic set of OMPI information at the OPAL >>>>> >>>>> level. Take >>>>> >>>>> >>>>> a look at opal/util/proc.h for more info, but please try not to expose >>>>> >>>>> more. >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: http://www.open- >>>>> >>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>> >>>>> mpi.org/community/lists/devel/2014/07/15348.php >>>>> >>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>> ------------------------------ >>>>> This email message is for the sole use of the intended recipient(s) >>>>> and may contain confidential information. Any unauthorized review, use, >>>>> disclosure or distribution is prohibited. If you are not the intended >>>>> recipient, please contact the sender by reply email and destroy all copies >>>>> of the original message. >>>>> ------------------------------ >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing listde...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php >>>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/08/15518.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/08/15519.php >>>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/08/15520.php >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/08/15521.php >>> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15523.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15526.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15527.php
Index: opal/util/proc.h =================================================================== --- opal/util/proc.h (revision 32438) +++ opal/util/proc.h (working copy) @@ -39,17 +39,15 @@ static inline __opal_attribute_always_inline__ void opal_process_name_ntoh_intr(opal_process_name_t *name) { - uint32_t * w = (uint32_t *)name; - w[0] = ntohl(w[0]); - w[1] = ntohl(w[1]); + uint64_t *lw = (uint64_t *)name; + *lw = ((((uint64_t)ntohl(*lw)) << 32) | ((uint64_t)ntohl(*lw >> 32))); } #define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid)) static inline __opal_attribute_always_inline__ void opal_process_name_hton_intr(opal_process_name_t *name) { - uint32_t * w = (uint32_t *)name; - w[0] = htonl(w[0]); - w[1] = htonl(w[1]); + uint64_t *lw = (uint64_t *)name; + *lw = ((((uint64_t)htonl(*lw)) << 32) | ((uint64_t)htonl(*lw >> 32))); } #else #define OPAL_PROCESS_NAME_NTOH(guid) Index: orte/include/orte/types.h =================================================================== --- orte/include/orte/types.h (revision 32438) +++ orte/include/orte/types.h (working copy) @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -83,18 +85,18 @@ #define ORTE_VPID_MAX UINT32_MAX-2 #define ORTE_VPID_MIN 0 +#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) #define ORTE_PROCESS_NAME_HTON(n) \ -do { \ - n.jobid = htonl(n.jobid); \ - n.vpid = htonl(n.vpid); \ -} while (0) + opal_process_name_hton_intr((opal_process_name_t *)&(n)) #define ORTE_PROCESS_NAME_NTOH(n) \ -do { \ - n.jobid = ntohl(n.jobid); \ - n.vpid = ntohl(n.vpid); \ -} while (0) + opal_process_name_ntoh_intr((opal_process_name_t *)&(n)) +#else +#define ORTE_PROCESS_NAME_HTON(n) +#define ORTE_PROCESS_NAME_NTOH(n) +#endif + #define ORTE_NAME_ARGS(n) \ (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : (unsigned long)(n)->jobid), \ (unsigned long) ((NULL == n) ? (unsigned long)ORTE_VPID_INVALID : (unsigned long)(n)->vpid) \ @@ -116,10 +118,17 @@ /* * define the process name structure */ +#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) struct orte_process_name_t { + orte_vpid_t vpid; /**< Process id - equivalent to rank */ orte_jobid_t jobid; /**< Job number */ +}; +#else +struct orte_process_name_t { + orte_jobid_t jobid; /**< Job number */ orte_vpid_t vpid; /**< Process id - equivalent to rank */ }; +#endif typedef struct orte_process_name_t orte_process_name_t;