Ralph and George,

here is attached a patch that fixes the heterogeneous support without
the abstraction violation.

Cheers,

Gilles

On 2014/08/06 9:40, Gilles Gouaillardet wrote:
> hummm
>
> i intentionally did not swap the two 32 bits (!)
>
> from the top level, what we have is :
>
> typedef struct {
>    union {
>       uint64_t opal;
>       struct {
>            uint32_t jobid;
>            uint32_t vpid;
>        } orte;
> } meta_process_name_t;
>
> OPAL is agnostic about jobid and vpid.
> jobid and vpid are set in ORTE/MPI and OPAL is used only
> to transport the 64 bits
> /* opal_process_name_t and orte_process_name_t are often casted into each
> other */
> at ORTE/MPI level, jobid and vpid are set individually
> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
> this is why everything works fine on homogeneous clusters regardless
> endianness.
>
> now in heterogeneous cluster, thing get a bit trickier ...
>
> i was initially unhappy with my commit and i think i found out why :
> this is an abstraction violation !
> the two 32 bits are not swapped by OPAL because this is what is expected by
> the ORTE/OMPI.
>
> now i d like to suggest the following lightweight approach :
>
> at OPAL, use #if protected htonll/ntohll
> (e.g. swap the two 32bits)
>
> do the trick at the ORTE level :
>
> simply replace
>
> struct orte_process_name_t {
>     orte_jobid_t jobid;
>     orte_vpid_t vpid;
> };
>
> with
>
> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
> struct orte_process_name_t {
>     orte_vpid_t vpid;
>     orte_jobid_t jobid;
> };
> #else
> struct orte_process_name_t {
>     orte_jobid_t jobid;
>     orte_vpid_t vpid;
> };
> #endif
>
>
> so we keep OPAL agnostic about how the uint64_t is really used at the upper
> level.
> an other option is to make OPAL aware of jobid and vpid but this is a bit
> more heavyweight imho.
>
> i'll try this today and make sure it works.
>
> any thoughts ?
>
> Cheers,
>
> Gilles
>
>
> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Ah yes, so it is - sorry I missed that last test :-/
>>
>> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>
>> The code committed by Gilles is correctly protected for big endian (
>> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely
>> pointing out that I think he should also swap the 2 32 bits in his
>> implementation.
>>
>>   George.
>>
>>
>>
>> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>
>>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Hmmm...wouldn't that then require that you know (a) the other side is
>>>> little endian, and (b) that you are on a big endian? Otherwise, you wind up
>>>> with the same issue in reverse, yes?
>>>>
>>> This is similar to the 32 bits ntohl that we are using in other parts of
>>> the project. Any  little endian participant will do the conversion, while
>>> every big endian participant will use an empty macro instead.
>>>
>>>
>>>> In the ORTE methods, we explicitly set the fields (e.g., jobid =
>>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it by
>>>> location instead of named fields - perhaps we should do that instead?
>>>>
>>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the
>>> notion of jobid and vpid) this approach will become possible.
>>>
>>>
>>> Not proposing that at all so long as the other method will work without
>>> knowing the other side's endianness. Sounds like your approach should work
>>> fine as long as Gilles adds a #if so big endian defines the macro away
>>>
>>>
>>>   George.
>>>
>>>
>>>
>>>>
>>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>
>>>> Technically speaking, converting a 64 bits to a big endian
>>>> representation requires the swap of the 2 32 bits parts. So the correct
>>>> approach would have been:
>>>> uint64_t htonll(uint64_t v)
>>>> {
>>>>     return ((((uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32));
>>>> }
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>>> FWIW: that's exactly how we do it in ORTE
>>>>>
>>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet <
>>>>> gilles.gouaillar...@iferc.org> wrote:
>>>>>
>>>>> George,
>>>>>
>>>>> i confirm there was a problem when running on an heterogeneous cluster,
>>>>> this is now fixed in r32425.
>>>>>
>>>>> i am not convinced i chose the most elegant way to achieve the desired
>>>>> result ...
>>>>> could you please double check this commit ?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Gilles
>>>>>
>>>>> On 2014/08/02 0:14, George Bosilca wrote:
>>>>>
>>>>> Gilles,
>>>>>
>>>>> The design of the BTL move was to let the opal_process_name_t be agnostic 
>>>>> to what is stored inside, and all accesses should be done through the 
>>>>> provided accessors. Thus, big endian or little endian doesn't make a 
>>>>> difference, as long as everything goes through the accessors.
>>>>>
>>>>> I'm skeptical about the support of heterogeneous environments in the 
>>>>> current code, so I didn't pay much attention to handling the case in the 
>>>>> TCP BTL. But in case we do care it is enough to make  the 2 macros point 
>>>>> to something meaningful instead of being empty (bswap_64 or something).
>>>>>
>>>>>   George.
>>>>>
>>>>> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> wrote:
>>>>>
>>>>>
>>>>> George and Ralph,
>>>>>
>>>>> i am very confused whether there is an issue or not.
>>>>>
>>>>>
>>>>> anyway, today Paul and i ran basic tests on big endian machines and did 
>>>>> not face any issue related to big endianness.
>>>>>
>>>>> so i made my homework, digged into the code, and basically, 
>>>>> opal_process_name_t is used as an orte_process_name_t.
>>>>> for example, in ompi_proc_init :
>>>>>
>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->jobid = 
>>>>> OMPI_PROC_MY_NAME->jobid;
>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->vpid = i;
>>>>>
>>>>> and with
>>>>>
>>>>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a))
>>>>>
>>>>> so as long as an opal_process_name_t is used as an orte_process_name_t, 
>>>>> there is no problem,
>>>>> regardless the endianness of the homogenous cluster we are running on.
>>>>>
>>>>> for the sake of readability (and for being pedantic too ;-) ) in r32357,
>>>>> &proc_temp->super.proc_name
>>>>> could be replaced with
>>>>> OMPI_CAST_ORTE_NAME(&proc_temp->super.proc_name)
>>>>>
>>>>>
>>>>>
>>>>> That being said, in btl/tcp, i noticed :
>>>>>
>>>>> in mca_btl_tcp_component_recv_handler :
>>>>>
>>>>>     opal_process_name_t guid;
>>>>> [...]
>>>>>     /* recv the process identifier */
>>>>>     retval = recv(sd, (char *)&guid, sizeof(guid), 0);
>>>>>     if(retval != sizeof(guid)) {
>>>>>         CLOSE_THE_SOCKET(sd);
>>>>>         return;
>>>>>     }
>>>>>     OPAL_PROCESS_NAME_NTOH(guid);
>>>>>
>>>>> and in mca_btl_tcp_endpoint_send_connect_ack :
>>>>>
>>>>>     /* send process identifier to remote endpoint */
>>>>>     opal_process_name_t guid = btl_proc->proc_opal->proc_name;
>>>>>
>>>>>     OPAL_PROCESS_NAME_HTON(guid);
>>>>>     if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid, 
>>>>> sizeof(guid)) !=
>>>>>
>>>>> and with
>>>>>
>>>>> #define OPAL_PROCESS_NAME_NTOH(guid)
>>>>> #define OPAL_PROCESS_NAME_HTON(guid)
>>>>>
>>>>>
>>>>> i had no time yet to test yet, but for now, i can only suspect :
>>>>> - there will be an issue with the tcp btl on an heterogeneous cluster
>>>>> - for this case, the fix is to have a different version of the 
>>>>> OPAL_PROCESS_NAME_xTOy
>>>>>   on little endian arch if heterogeneous mode is supported.
>>>>>
>>>>>
>>>>>
>>>>> does that make sense ?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> On 2014/07/31 1:29, George Bosilca wrote:
>>>>>
>>>>> The underlying structure changed, so a little bit of fiddling is normal.
>>>>> Instead of using a field in the ompi_proc_t you are now using a field down
>>>>> in opal_proc_t, a field that simply cannot have the same type as before
>>>>> (orte_process_name_t).
>>>>>
>>>>>   George.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> 
>>>>> <r...@open-mpi.org> wrote:
>>>>>
>>>>>
>>>>> George - my point was that we regularly tested using the method in that
>>>>> routine, and now we have to do something a little different. So it is an
>>>>> "issue" in that we have to make changes across the code base to ensure we
>>>>> do things the "new" way, that's all
>>>>>
>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>
>>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>>> correctly (aka only via the exposed accessors).
>>>>>
>>>>>   George.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>>> <r...@open-mpi.org> wrote:
>>>>>
>>>>>
>>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>
>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>
>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>>> bits storage location used by the upper layer to save some local key that
>>>>> can be later used to extract information. Calling the OPAL level compare
>>>>> function might be a better fit there.
>>>>>
>>>>>   George.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> Ralph,
>>>>>
>>>>> was it really that simple ?
>>>>>
>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>> typedef uint64_t opal_identifier_t;
>>>>>
>>>>> *but*
>>>>>
>>>>> item_ptr->peer has type orte_process_name_t :
>>>>> struct orte_process_name_t {
>>>>>    orte_jobid_t jobid;
>>>>>    orte_vpid_t vpid;
>>>>> };
>>>>>
>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> 
>>>>> <r...@open-mpi.org>
>>>>> wrote:
>>>>>
>>>>>
>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>> name was being passed into the function instead of a pointer to the name
>>>>>
>>>>> r32357
>>>>>
>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>
>>>>> Rolf,
>>>>>
>>>>> r32353 can be seen as a suspect...
>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>
>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>
>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>> it and drop a note to #4815
>>>>> ( I am afk until tomorrow)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>>>
>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>> if I do not include "--mca coll ^ml".    Here is a stack trace from the
>>>>> ibm/pt2pt/send test running on a single node.
>>>>>
>>>>>
>>>>>
>>>>> (gdb) where
>>>>>
>>>>> #0  0x00007f6c0d1321d0 in ?? ()
>>>>>
>>>>> #1  <signal handler called>
>>>>>
>>>>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>> ../../orte/util/name_fns.c:522
>>>>>
>>>>> #3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>
>>>>>     comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>> "sm_payload_mem_", map_all=false) at
>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>
>>>>> #4  0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>> reg_data=0xba28c0)
>>>>>
>>>>>     at
>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>
>>>>> #5  0x00007f6c0cced386 in mca_coll_ml_register_bcols
>>>>> (ml_module=0xba5c40) at 
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>
>>>>> #6  0x00007f6c0cced68f in ml_module_memory_initialization
>>>>> (ml_module=0xba5c40) at 
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>
>>>>> #7  0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>
>>>>> #8  0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>> priority=0x7fffe7991b58) at
>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>
>>>>> #9  0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>
>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>
>>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>
>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>
>>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>
>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>
>>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>> comm=0x6037a0) at 
>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>
>>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>
>>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>
>>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>
>>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>> send.c:32
>>>>>
>>>>> (gdb) up
>>>>>
>>>>> #1  <signal handler called>
>>>>>
>>>>> (gdb) up
>>>>>
>>>>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>> ../../orte/util/name_fns.c:522
>>>>>
>>>>> 522           if (name1->jobid < name2->jobid) {
>>>>>
>>>>> (gdb) print name1
>>>>>
>>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>>
>>>>> (gdb) print *name1
>>>>>
>>>>> Cannot access memory at address 0x192350001
>>>>>
>>>>> (gdb) print name2
>>>>>
>>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>>
>>>>> (gdb) print *name2
>>>>>
>>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>>
>>>>> (gdb)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: devel [mailto:devel-boun...@open-mpi.org 
>>>>> <devel-boun...@open-mpi.org>
>>>>>
>>>>> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of 
>>>>> Gilles
>>>>>
>>>>>
>>>>> Gouaillardet
>>>>> Sent: Wednesday, July 30, 2014 2:16 AM
>>>>> To: Open MPI Developers
>>>>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>>> George,
>>>>> #4815 is indirectly related to the move :
>>>>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>>> we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>>> (which causes a glory SIGSEGV)
>>>>> i proposed a temporary patch which is both broken and unelegant, could
>>>>>
>>>>> you
>>>>>
>>>>>
>>>>> please advise a correct solution ?
>>>>> Cheers,
>>>>> Gilles
>>>>> On 2014/07/27 7:37, George Bosilca wrote:
>>>>>
>>>>> If you have any issue with the move, I'll be happy to help and/or
>>>>>
>>>>> support
>>>>>
>>>>>
>>>>> you on your last move toward a completely generic BTL. To facilitate
>>>>>
>>>>> your
>>>>>
>>>>>
>>>>> work I exposed a minimalistic set of OMPI information at the OPAL
>>>>>
>>>>> level. Take
>>>>>
>>>>>
>>>>> a look at opal/util/proc.h for more info, but please try not to expose
>>>>>
>>>>> more.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: http://www.open-
>>>>>
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>
>>>>> mpi.org/community/lists/devel/2014/07/15348.php
>>>>>
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>  ------------------------------
>>>>>  This email message is for the sole use of the intended recipient(s)
>>>>> and may contain confidential information.  Any unauthorized review, use,
>>>>> disclosure or distribution is prohibited.  If you are not the intended
>>>>> recipient, please contact the sender by reply email and destroy all copies
>>>>> of the original message.
>>>>>  ------------------------------
>>>>>  _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this 
>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15518.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15519.php
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15520.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15521.php
>>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15523.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15526.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15527.php

Index: opal/util/proc.h
===================================================================
--- opal/util/proc.h    (revision 32438)
+++ opal/util/proc.h    (working copy)
@@ -39,17 +39,15 @@
 static inline __opal_attribute_always_inline__ void
 opal_process_name_ntoh_intr(opal_process_name_t *name)
 {
-    uint32_t * w = (uint32_t *)name;
-    w[0] = ntohl(w[0]);
-    w[1] = ntohl(w[1]);
+    uint64_t *lw = (uint64_t *)name;
+    *lw = ((((uint64_t)ntohl(*lw)) << 32) | ((uint64_t)ntohl(*lw >> 32)));
 }
 #define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid))
 static inline __opal_attribute_always_inline__ void
 opal_process_name_hton_intr(opal_process_name_t *name)
 {
-    uint32_t * w = (uint32_t *)name;
-    w[0] = htonl(w[0]);
-    w[1] = htonl(w[1]);
+    uint64_t *lw = (uint64_t *)name;
+    *lw = ((((uint64_t)htonl(*lw)) << 32) | ((uint64_t)htonl(*lw >> 32)));
 }
 #else
 #define OPAL_PROCESS_NAME_NTOH(guid)
Index: orte/include/orte/types.h
===================================================================
--- orte/include/orte/types.h   (revision 32438)
+++ orte/include/orte/types.h   (working copy)
@@ -10,6 +10,8 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  *                         All rights reserved.
  * Copyright (c) 2014      Intel, Inc. All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -83,18 +85,18 @@
 #define ORTE_VPID_MAX       UINT32_MAX-2
 #define ORTE_VPID_MIN       0

+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
 #define ORTE_PROCESS_NAME_HTON(n)       \
-do {                                    \
-    n.jobid = htonl(n.jobid);           \
-    n.vpid = htonl(n.vpid);             \
-} while (0)
+    opal_process_name_hton_intr((opal_process_name_t *)&(n))

 #define ORTE_PROCESS_NAME_NTOH(n)       \
-do {                                    \
-    n.jobid = ntohl(n.jobid);           \
-    n.vpid = ntohl(n.vpid);             \
-} while (0)
+    opal_process_name_ntoh_intr((opal_process_name_t *)&(n))
+#else
+#define ORTE_PROCESS_NAME_HTON(n)

+#define ORTE_PROCESS_NAME_NTOH(n)
+#endif
+
 #define ORTE_NAME_ARGS(n) \
     (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : 
(unsigned long)(n)->jobid), \
     (unsigned long) ((NULL == n) ? (unsigned long)ORTE_VPID_INVALID : 
(unsigned long)(n)->vpid) \
@@ -116,10 +118,17 @@
 /*
  * define the process name structure
  */
+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
 struct orte_process_name_t {
+    orte_vpid_t vpid;       /**< Process id - equivalent to rank */
     orte_jobid_t jobid;     /**< Job number */
+};
+#else
+struct orte_process_name_t {
+    orte_jobid_t jobid;     /**< Job number */
     orte_vpid_t vpid;       /**< Process id - equivalent to rank */
 };
+#endif
 typedef struct orte_process_name_t orte_process_name_t;


Reply via email to