Ralph and George,

here are attached two patches :
- heterogeneous.v1.patch : a cleanup of the previous patch
- heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made
the minimal changes to move jobid and vpid into the OPAL layer.

Cheers,

Gilles

On 2014/08/07 11:27, Ralph Castain wrote:
> Are we maybe approaching this from the wrong direction? I ask because we had 
> to do some gyrations in the pmix framework to work around the difference in 
> naming schemes between OPAL and the rest of the code base, and now we have 
> more gyrations here.
>
> Given that the MPI and RTE layers both rely on the structured form of the 
> name, what about if we just mimic that down in OPAL? I think we could perhaps 
> do this in a way that still allows someone to overlay it with a 64-bit 
> unstructured identifier if they want, but that would put the extra work on 
> their side. In other words, we make it easy to work with the other parts of 
> our own code base, acknowledging that those wanting to do something else may 
> have to do some extra work.
>
> I ask because every resource manager out there assigns each process a jobid 
> and vpid in some form of integer format. So we have to absorb that 
> information in {jobid, vpid} format regardless of what we may want to do 
> internally. What we now have to do is immediately convert that into the 
> unstructured form for OPAL (where we take it in via PMI), then convert it 
> back to structured form when passing it up to ORTE so it can be handed to 
> OMPI, and then convert it back to unstructured form every time either OMPI or 
> ORTE accesses the OPAL layer.
>
> Seems awfully convoluted and error prone. Simplifying things for ourselves 
> might make more sense.
>
>
> On Aug 6, 2014, at 1:21 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> Gilles,
>>
>> This looks right. It is really unfortunately that we have to change the 
>> definition of orte_process_name_t for big endian architectures, but I don't 
>> think there is a way around.
>>
>> Regarding your patch I have two comments:
>> 1. There is a flagrant lack of comments ... especially on the ORTE side
>> 2. at the OPAL level we are really implementing a htonll, and I really think 
>> we should stick to the POSIX prototype (aka. returning the changes value 
>> instead of doing things inplace).
>>
>>   George.
>>
>>
>>
>> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>> Ralph and George,
>>
>> here is attached a patch that fixes the heterogeneous support without the 
>> abstraction violation.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/08/06 9:40, Gilles Gouaillardet wrote:
>>> hummm
>>>
>>> i intentionally did not swap the two 32 bits (!)
>>>
>>> from the top level, what we have is :
>>>
>>> typedef struct {
>>>    union {
>>>       uint64_t opal;
>>>       struct {
>>>            uint32_t jobid;
>>>            uint32_t vpid;
>>>        } orte;
>>> } meta_process_name_t;
>>>
>>> OPAL is agnostic about jobid and vpid.
>>> jobid and vpid are set in ORTE/MPI and OPAL is used only
>>> to transport the 64 bits
>>> /* opal_process_name_t and orte_process_name_t are often casted into each
>>> other */
>>> at ORTE/MPI level, jobid and vpid are set individually
>>> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
>>> this is why everything works fine on homogeneous clusters regardless
>>> endianness.
>>>
>>> now in heterogeneous cluster, thing get a bit trickier ...
>>>
>>> i was initially unhappy with my commit and i think i found out why :
>>> this is an abstraction violation !
>>> the two 32 bits are not swapped by OPAL because this is what is expected by
>>> the ORTE/OMPI.
>>>
>>> now i d like to suggest the following lightweight approach :
>>>
>>> at OPAL, use #if protected htonll/ntohll
>>> (e.g. swap the two 32bits)
>>>
>>> do the trick at the ORTE level :
>>>
>>> simply replace
>>>
>>> struct orte_process_name_t {
>>>     orte_jobid_t jobid;
>>>     orte_vpid_t vpid;
>>> };
>>>
>>> with
>>>
>>> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
>>> struct orte_process_name_t {
>>>     orte_vpid_t vpid;
>>>     orte_jobid_t jobid;
>>> };
>>> #else
>>> struct orte_process_name_t {
>>>     orte_jobid_t jobid;
>>>     orte_vpid_t vpid;
>>> };
>>> #endif
>>>
>>>
>>> so we keep OPAL agnostic about how the uint64_t is really used at the upper
>>> level.
>>> an other option is to make OPAL aware of jobid and vpid but this is a bit
>>> more heavyweight imho.
>>>
>>> i'll try this today and make sure it works.
>>>
>>> any thoughts ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Ah yes, so it is - sorry I missed that last test :-/
>>>>
>>>> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>
>>>> The code committed by Gilles is correctly protected for big endian (
>>>> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely
>>>> pointing out that I think he should also swap the 2 32 bits in his
>>>> implementation.
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>>
>>>>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>
>>>>>> Hmmm...wouldn't that then require that you know (a) the other side is
>>>>>> little endian, and (b) that you are on a big endian? Otherwise, you wind 
>>>>>> up
>>>>>> with the same issue in reverse, yes?
>>>>>>
>>>>> This is similar to the 32 bits ntohl that we are using in other parts of
>>>>> the project. Any  little endian participant will do the conversion, while
>>>>> every big endian participant will use an empty macro instead.
>>>>>
>>>>>
>>>>>> In the ORTE methods, we explicitly set the fields (e.g., jobid =
>>>>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it 
>>>>>> by
>>>>>> location instead of named fields - perhaps we should do that instead?
>>>>>>
>>>>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the
>>>>> notion of jobid and vpid) this approach will become possible.
>>>>>
>>>>>
>>>>> Not proposing that at all so long as the other method will work without
>>>>> knowing the other side's endianness. Sounds like your approach should work
>>>>> fine as long as Gilles adds a #if so big endian defines the macro away
>>>>>
>>>>>
>>>>>   George.
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>>>
>>>>>> Technically speaking, converting a 64 bits to a big endian
>>>>>> representation requires the swap of the 2 32 bits parts. So the correct
>>>>>> approach would have been:
>>>>>> uint64_t htonll(uint64_t v)
>>>>>> {
>>>>>>     return ((((uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32));
>>>>>> }
>>>>>>
>>>>>>   George.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>
>>>>>>> FWIW: that's exactly how we do it in ORTE
>>>>>>>
>>>>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet <
>>>>>>> gilles.gouaillar...@iferc.org
>>>>>>>> wrote:
>>>>>>> George,
>>>>>>>
>>>>>>> i confirm there was a problem when running on an heterogeneous cluster,
>>>>>>> this is now fixed in r32425.
>>>>>>>
>>>>>>> i am not convinced i chose the most elegant way to achieve the desired
>>>>>>> result ...
>>>>>>> could you please double check this commit ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> On 2014/08/02 0:14, George Bosilca wrote:
>>>>>>>
>>>>>>> Gilles,
>>>>>>>
>>>>>>> The design of the BTL move was to let the opal_process_name_t be 
>>>>>>> agnostic to what is stored inside, and all accesses should be done 
>>>>>>> through the provided accessors. Thus, big endian or little endian 
>>>>>>> doesn't make a difference, as long as everything goes through the 
>>>>>>> accessors.
>>>>>>>
>>>>>>> I'm skeptical about the support of heterogeneous environments in the 
>>>>>>> current code, so I didn't pay much attention to handling the case in 
>>>>>>> the TCP BTL. But in case we do care it is enough to make  the 2 macros 
>>>>>>> point to something meaningful instead of being empty (bswap_64 or 
>>>>>>> something).
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> George and Ralph,
>>>>>>>
>>>>>>> i am very confused whether there is an issue or not.
>>>>>>>
>>>>>>>
>>>>>>> anyway, today Paul and i ran basic tests on big endian machines and did 
>>>>>>> not face any issue related to big endianness.
>>>>>>>
>>>>>>> so i made my homework, digged into the code, and basically, 
>>>>>>> opal_process_name_t is used as an orte_process_name_t.
>>>>>>> for example, in ompi_proc_init :
>>>>>>>
>>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->jobid = 
>>>>>>> OMPI_PROC_MY_NAME->jobid;
>>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->vpid = i;
>>>>>>>
>>>>>>> and with
>>>>>>>
>>>>>>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a))
>>>>>>>
>>>>>>> so as long as an opal_process_name_t is used as an orte_process_name_t, 
>>>>>>> there is no problem,
>>>>>>> regardless the endianness of the homogenous cluster we are running on.
>>>>>>>
>>>>>>> for the sake of readability (and for being pedantic too ;-) ) in r32357,
>>>>>>> &proc_temp->super.proc_name
>>>>>>> could be replaced with
>>>>>>> OMPI_CAST_ORTE_NAME(&proc_temp->super.proc_name)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> That being said, in btl/tcp, i noticed :
>>>>>>>
>>>>>>> in mca_btl_tcp_component_recv_handler :
>>>>>>>
>>>>>>>     opal_process_name_t guid;
>>>>>>> [...]
>>>>>>>     /* recv the process identifier */
>>>>>>>     retval = recv(sd, (char *)&guid, sizeof(guid), 0);
>>>>>>>     if(retval != sizeof(guid)) {
>>>>>>>         CLOSE_THE_SOCKET(sd);
>>>>>>>         return;
>>>>>>>     }
>>>>>>>     OPAL_PROCESS_NAME_NTOH(guid);
>>>>>>>
>>>>>>> and in mca_btl_tcp_endpoint_send_connect_ack :
>>>>>>>
>>>>>>>     /* send process identifier to remote endpoint */
>>>>>>>     opal_process_name_t guid = btl_proc->proc_opal->proc_name;
>>>>>>>
>>>>>>>     OPAL_PROCESS_NAME_HTON(guid);
>>>>>>>     if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid, 
>>>>>>> sizeof(guid)) !=
>>>>>>>
>>>>>>> and with
>>>>>>>
>>>>>>> #define OPAL_PROCESS_NAME_NTOH(guid)
>>>>>>> #define OPAL_PROCESS_NAME_HTON(guid)
>>>>>>>
>>>>>>>
>>>>>>> i had no time yet to test yet, but for now, i can only suspect :
>>>>>>> - there will be an issue with the tcp btl on an heterogeneous cluster
>>>>>>> - for this case, the fix is to have a different version of the 
>>>>>>> OPAL_PROCESS_NAME_xTOy
>>>>>>>   on little endian arch if heterogeneous mode is supported.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> does that make sense ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On 2014/07/31 1:29, George Bosilca wrote:
>>>>>>>
>>>>>>> The underlying structure changed, so a little bit of fiddling is normal.
>>>>>>> Instead of using a field in the ompi_proc_t you are now using a field 
>>>>>>> down
>>>>>>> in opal_proc_t, a field that simply cannot have the same type as before
>>>>>>> (orte_process_name_t).
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> George - my point was that we regularly tested using the method in that
>>>>>>> routine, and now we have to do something a little different. So it is an
>>>>>>> "issue" in that we have to make changes across the code base to ensure 
>>>>>>> we
>>>>>>> do things the "new" way, that's all
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>>>
>>>>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>>>>> correctly (aka only via the exposed accessors).
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <r...@open-mpi.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>>>>
>>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>> <bosi...@icl.utk.edu> wrote:
>>>>>>>
>>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>>>>> bits storage location used by the upper layer to save some local key 
>>>>>>> that
>>>>>>> can be later used to extract information. Calling the OPAL level compare
>>>>>>> function might be a better fit there.
>>>>>>>
>>>>>>>   George.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> was it really that simple ?
>>>>>>>
>>>>>>> proc_temp->super.proc_name has type opal_process_name_t :
>>>>>>> typedef opal_identifier_t opal_process_name_t;
>>>>>>> typedef uint64_t opal_identifier_t;
>>>>>>>
>>>>>>> *but*
>>>>>>>
>>>>>>> item_ptr->peer has type orte_process_name_t :
>>>>>>> struct orte_process_name_t {
>>>>>>>    orte_jobid_t jobid;
>>>>>>>    orte_vpid_t vpid;
>>>>>>> };
>>>>>>>
>>>>>>> bottom line, is r32357 still valid on a big endian arch ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> <r...@open-mpi.org>
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I just fixed this one - all that was required was an ampersand as the
>>>>>>> name was being passed into the function instead of a pointer to the name
>>>>>>>
>>>>>>> r32357
>>>>>>>
>>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET 
>>>>>>> <gilles.gouaillar...@gmail.com>
>>>>>>>  wrote:
>>>>>>>
>>>>>>> Rolf,
>>>>>>>
>>>>>>> r32353 can be seen as a suspect...
>>>>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>>>>
>>>>>>> does the attached patch to #4815 fixes the problem ?
>>>>>>>
>>>>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>>>>> it and drop a note to #4815
>>>>>>> ( I am afk until tomorrow)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote:
>>>>>>>
>>>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>>>>> if I do not include "--mca coll ^ml".    Here is a stack trace from the
>>>>>>> ibm/pt2pt/send test running on a single node.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (gdb) where
>>>>>>>
>>>>>>> #0  0x00007f6c0d1321d0 in ?? ()
>>>>>>>
>>>>>>> #1  <signal handler called>
>>>>>>>
>>>>>>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>
>>>>>>> #3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, 
>>>>>>> peer_list=0x7f6c0c0a6748,
>>>>>>> back_files=0x7f6bf3ffd6c8,
>>>>>>>
>>>>>>>     comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>>>>> "sm_payload_mem_", map_all=false) at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>>>>
>>>>>>> #4  0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>>>>> reg_data=0xba28c0)
>>>>>>>
>>>>>>>     at
>>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>>>>
>>>>>>> #5  0x00007f6c0cced386 in mca_coll_ml_register_bcols
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>>>>
>>>>>>> #6  0x00007f6c0cced68f in ml_module_memory_initialization
>>>>>>> (ml_module=0xba5c40) at 
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>>>>
>>>>>>> #7  0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>>>>
>>>>>>> #8  0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>>>>> priority=0x7fffe7991b58) at
>>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>>>>
>>>>>>> #9  0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>>>>
>>>>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>>>>
>>>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>>>>
>>>>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>>>>
>>>>>>>     at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>>>>
>>>>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>>>>> comm=0x6037a0) at 
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>>>>
>>>>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>>>>
>>>>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>>>>> requested=0, provided=0x7fffe79922e8) at
>>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>>>>
>>>>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>>>>
>>>>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>>>>> send.c:32
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #1  <signal handler called>
>>>>>>>
>>>>>>> (gdb) up
>>>>>>>
>>>>>>> #2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at 
>>>>>>> ../../orte/util/name_fns.c:522
>>>>>>>
>>>>>>> 522           if (name1->jobid < name2->jobid) {
>>>>>>>
>>>>>>> (gdb) print name1
>>>>>>>
>>>>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>>>>
>>>>>>> (gdb) print *name1
>>>>>>>
>>>>>>> Cannot access memory at address 0x192350001
>>>>>>>
>>>>>>> (gdb) print name2
>>>>>>>
>>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>>>>
>>>>>>> (gdb) print *name2
>>>>>>>
>>>>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>>>>
>>>>>>> (gdb)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: devel [mailto:devel-boun...@open-mpi.org 
>>>>>>> <devel-boun...@open-mpi.org>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of 
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> Gouaillardet
>>>>>>> Sent: Wednesday, July 30, 2014 2:16 AM
>>>>>>> To: Open MPI Developers
>>>>>>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>>>>> George,
>>>>>>> #4815 is indirectly related to the move :
>>>>>>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>>>>> we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>>>>> (which causes a glory SIGSEGV)
>>>>>>> i proposed a temporary patch which is both broken and unelegant, could
>>>>>>>
>>>>>>> you
>>>>>>>
>>>>>>>
>>>>>>> please advise a correct solution ?
>>>>>>> Cheers,
>>>>>>> Gilles
>>>>>>> On 2014/07/27 7:37, George Bosilca wrote:
>>>>>>>
>>>>>>> If you have any issue with the move, I'll be happy to help and/or
>>>>>>>
>>>>>>> support
>>>>>>>
>>>>>>>
>>>>>>> you on your last move toward a completely generic BTL. To facilitate
>>>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>
>>>>>>> work I exposed a minimalistic set of OMPI information at the OPAL
>>>>>>>
>>>>>>> level. Take
>>>>>>>
>>>>>>>
>>>>>>> a look at opal/util/proc.h for more info, but please try not to expose
>>>>>>>
>>>>>>> more.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: http://www.open-
>>>>>>>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>>
>>>>>>> mpi.org/community/lists/devel/2014/07/15348.php
>>>>>>>
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>>>>
>>>>>>>
>>>>>>>  ------------------------------
>>>>>>>  This email message is for the sole use of the intended recipient(s)
>>>>>>> and may contain confidential information.  Any unauthorized review, use,
>>>>>>> disclosure or distribution is prohibited.  If you are not the intended
>>>>>>> recipient, please contact the sender by reply email and destroy all 
>>>>>>> copies
>>>>>>> of the original message.
>>>>>>>  ------------------------------
>>>>>>>  _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15518.php
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15519.php
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15520.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15521.php
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15523.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15526.php
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15527.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15529.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15530.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15531.php

Index: opal/util/proc.h
===================================================================
--- opal/util/proc.h    (revision 32440)
+++ opal/util/proc.h    (working copy)
@@ -21,7 +21,7 @@
 #include "opal/dss/dss.h"

 #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT
-#include <arpa/inet.h>
+#include "opal/types.h"
 #endif

 /**
@@ -35,22 +35,11 @@
 typedef opal_identifier_t opal_process_name_t;

 #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
-#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_ntoh_intr(opal_process_name_t *name)
-{
-    uint32_t * w = (uint32_t *)name;
-    w[0] = ntohl(w[0]);
-    w[1] = ntohl(w[1]);
-}
-#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_hton_intr(opal_process_name_t *name)
-{
-    uint32_t * w = (uint32_t *)name;
-    w[0] = htonl(w[0]);
-    w[1] = htonl(w[1]);
-}
+#define OPAL_PROCESS_NAME_NTOH(guid) \
+    guid = ntoh64(guid)
+
+#define OPAL_PROCESS_NAME_HTON(guid) \
+    guid = hton64(guid)
 #else
 #define OPAL_PROCESS_NAME_NTOH(guid)
 #define OPAL_PROCESS_NAME_HTON(guid)
Index: orte/include/orte/types.h
===================================================================
--- orte/include/orte/types.h   (revision 32440)
+++ orte/include/orte/types.h   (working copy)
@@ -10,6 +10,8 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  *                         All rights reserved.
  * Copyright (c) 2014      Intel, Inc. All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -83,18 +85,18 @@
 #define ORTE_VPID_MAX       UINT32_MAX-2
 #define ORTE_VPID_MIN       0

-#define ORTE_PROCESS_NAME_HTON(n)       \
-do {                                    \
-    n.jobid = htonl(n.jobid);           \
-    n.vpid = htonl(n.vpid);             \
-} while (0)
+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
+#define ORTE_PROCESS_NAME_HTON(n)                      \
+    OPAL_PROCESS_NAME_HTON(*(opal_process_name_t *)&(n))

-#define ORTE_PROCESS_NAME_NTOH(n)       \
-do {                                    \
-    n.jobid = ntohl(n.jobid);           \
-    n.vpid = ntohl(n.vpid);             \
-} while (0)
+#define ORTE_PROCESS_NAME_NTOH(n)                      \
+    OPAL_PROCESS_NAME_NTOH(*(opal_process_name_t *)&(n))
+#else
+#define ORTE_PROCESS_NAME_HTON(n)

+#define ORTE_PROCESS_NAME_NTOH(n)
+#endif
+
 #define ORTE_NAME_ARGS(n) \
     (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : 
(unsigned long)(n)->jobid), \
     (unsigned long) ((NULL == n) ? (unsigned long)ORTE_VPID_INVALID : 
(unsigned long)(n)->vpid) \
@@ -115,11 +117,23 @@

 /*
  * define the process name structure
+ * the OPAL layer sees an orte_process_name_t as an opal_process_name_t aka 
uint64_t
+ * if heterogeneous is supported, when converting this uint64_t to
+ * an endian neutral format, vpid and jobid will be swapped.
+ * consequently, the orte_process_name_t struct must have different definitions
+ * (swap jobid and vpid) on little and big endian arch.
  */
+#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
 struct orte_process_name_t {
+    orte_vpid_t vpid;       /**< Process id - equivalent to rank */
     orte_jobid_t jobid;     /**< Job number */
+};
+#else
+struct orte_process_name_t {
+    orte_jobid_t jobid;     /**< Job number */
     orte_vpid_t vpid;       /**< Process id - equivalent to rank */
 };
+#endif
 typedef struct orte_process_name_t orte_process_name_t;


Index: oshmem/mca/scoll/mpi/scoll_mpi_module.c
===================================================================
--- oshmem/mca/scoll/mpi/scoll_mpi_module.c     (revision 32440)
+++ oshmem/mca/scoll/mpi/scoll_mpi_module.c     (working copy)
@@ -1,11 +1,13 @@
 /**
-  Copyright (c) 2011 Mellanox Technologies. All rights reserved.
-  Copyright (c) 2014 Cisco Systems, Inc.  All rights reserved.
-  $COPYRIGHT$
-
-  Additional copyrights may follow
-
- $HEADER$
+ * Copyright (c) 2011 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2014 Cisco Systems, Inc.  All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
+ * $COPYRIGHT$
+ *
+ * Additional copyrights may follow
+ *
+ * $HEADER$
  */

 #include "ompi_config.h"
@@ -125,7 +127,7 @@
             ompi_proc_t* ompi_proc;
             for( int j = 0; j < ompi_group_size(parent_group); j++ ) {
                 ompi_proc = ompi_group_peer_lookup(parent_group, j);
-                if( ompi_proc->super.proc_name == 
osh_group->proc_array[i]->super.proc_name) {
+                if( ompi_proc->super.proc_name.id == 
osh_group->proc_array[i]->super.proc_name.id) {
                     ranks[i] = j;
                     break;
                 }
Index: opal/mca/btl/tcp/btl_tcp_proc.c
===================================================================
--- opal/mca/btl/tcp/btl_tcp_proc.c     (revision 32440)
+++ opal/mca/btl/tcp/btl_tcp_proc.c     (working copy)
@@ -12,6 +12,8 @@
  *                         All rights reserved.
  * Copyright (c) 2008-2010 Oracle and/or its affiliates.  All rights reserved
  * Copyright (c) 2013      Intel, Inc. All rights reserved
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -77,7 +79,7 @@
     /* remove from list of all proc instances */
     OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock);
     opal_hash_table_remove_value_uint64(&mca_btl_tcp_component.tcp_procs, 
-                                        tcp_proc->proc_opal->proc_name);
+                                        tcp_proc->proc_opal->proc_name.id);
     OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock);

     /* release resources */
@@ -97,7 +99,7 @@

 mca_btl_tcp_proc_t* mca_btl_tcp_proc_create(const opal_proc_t* proc)
 {
-    uint64_t hash = proc->proc_name;
+    uint64_t hash = proc->proc_name.id;
     mca_btl_tcp_proc_t* btl_proc;
     size_t size;
     int rc;
@@ -719,7 +721,7 @@
     mca_btl_tcp_proc_t* proc = NULL;
     OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock);
     opal_hash_table_get_value_uint64(&mca_btl_tcp_component.tcp_procs, 
-                                     *name, (void**)&proc);
+                                     name->id, (void**)&proc);
     OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock);
     return proc;
 }
Index: opal/mca/btl/openib/btl_openib.c
===================================================================
--- opal/mca/btl/openib/btl_openib.c    (revision 32440)
+++ opal/mca/btl/openib/btl_openib.c    (working copy)
@@ -1064,7 +1064,7 @@
             rc = mca_btl_openib_ib_address_add_new(
                     ib_proc->proc_ports[j].pm_port_info.lid,
                     ib_proc->proc_ports[j].pm_port_info.subnet_id,
-                    opal_process_name_jobid(proc->proc_name), endpoint);
+                    proc->proc_name, endpoint);
             if (OPAL_SUCCESS != rc ) {
                 OPAL_THREAD_UNLOCK(&ib_proc->proc_lock);
                 return OPAL_ERROR;
Index: opal/util/proc.c
===================================================================
--- opal/util/proc.c    (revision 32440)
+++ opal/util/proc.c    (working copy)
@@ -3,6 +3,8 @@
  *                         of Tennessee Research Foundation.  All rights
  *                         reserved.
  * Copyright (c) 2013      Inria.  All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -29,7 +31,7 @@
 static opal_proc_t opal_local_proc = {
     { .opal_list_next = NULL,
       .opal_list_prev = NULL},
-    0x1122334455667788,
+    { .id = 0x1122334455667788},
     0,
     0,
     NULL,
@@ -42,13 +44,13 @@
     proc->proc_arch = opal_local_arch;
     proc->proc_convertor = NULL;
     proc->proc_flags = 0;
-    proc->proc_name = 0;
+    proc->proc_name.id = 0;
 }

 static void opal_proc_destruct(opal_proc_t* proc)
 {
     proc->proc_flags     = 0;
-    proc->proc_name      = 0;
+    proc->proc_name.id = 0;
     proc->proc_hostname  = NULL;
     proc->proc_convertor = NULL;
 }
@@ -60,8 +62,8 @@
 opal_compare_opal_procs(const opal_process_name_t proc1,
                         const opal_process_name_t proc2)
 {
-    if( proc1 == proc2 ) return  0;
-    if( proc1 <  proc2 ) return -1;
+    if( proc1.id == proc2.id ) return  0;
+    if( proc1.id <  proc2.id ) return -1;
     return 1;
 }

Index: opal/util/proc.h
===================================================================
--- opal/util/proc.h    (revision 32440)
+++ opal/util/proc.h    (working copy)
@@ -32,25 +32,30 @@
  * is to be copied from one structure to another, otherwise it should
  * only be used via the accessors defined below.
  */
-typedef opal_identifier_t opal_process_name_t;
+typedef uint32_t opal_jobid_t;
+typedef uint32_t opal_vpid_t;
+typedef struct {
+    opal_jobid_t jobid;
+    opal_jobid_t vpid;
+} opal_proc_name_t ;

+typedef union {
+    opal_proc_name_t name;
+    opal_identifier_t id;
+} opal_process_name_t;
+
 #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
-#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_ntoh_intr(opal_process_name_t *name)
-{
-    uint32_t * w = (uint32_t *)name;
-    w[0] = ntohl(w[0]);
-    w[1] = ntohl(w[1]);
-}
-#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid))
-static inline __opal_attribute_always_inline__ void
-opal_process_name_hton_intr(opal_process_name_t *name)
-{
-    uint32_t * w = (uint32_t *)name;
-    w[0] = htonl(w[0]);
-    w[1] = htonl(w[1]);
-}
+#define OPAL_PROCESS_NAME_NTOH(n)       \
+do {                                    \
+    n.name.jobid = ntohl(n.name.jobid); \
+    n.name.vpid  = ntohl(n.name.vpid);  \
+} while (0);
+    
+#define OPAL_PROCESS_NAME_HTON(n)       \
+do {                                    \
+    n.name.jobid = htonl(n.name.jobid); \
+    n.name.vpid  = htonl(n.name.vpid);  \
+} while (0);
 #else
 #define OPAL_PROCESS_NAME_NTOH(guid)
 #define OPAL_PROCESS_NAME_HTON(guid)
Index: ompi/mca/dpm/orte/dpm_orte.c
===================================================================
--- ompi/mca/dpm/orte/dpm_orte.c        (revision 32440)
+++ ompi/mca/dpm/orte/dpm_orte.c        (working copy)
@@ -16,6 +16,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC.  All rights
  *                         reserved. 
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -1767,7 +1769,7 @@
 }

 static void paccept_recv(int status,
-                         struct orte_process_name_t* peer,
+                         orte_process_name_t* peer,
                          struct opal_buffer_t* buffer,
                          orte_rml_tag_t tag,
                          void* cbdata)
Index: orte/mca/rml/rml.h
===================================================================
--- orte/mca/rml/rml.h  (revision 32440)
+++ orte/mca/rml/rml.h  (working copy)
@@ -11,6 +11,8 @@
  *                         All rights reserved.
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC.  All rights
  *                         reserved. 
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -52,7 +54,6 @@


 struct opal_buffer_t;
-struct orte_process_name_t;
 struct orte_rml_module_t;
 typedef struct {
     opal_object_t super;
@@ -146,7 +147,7 @@
  * @param[in] cbdata  User data passed to send_nb()
  */
 typedef void (*orte_rml_callback_fn_t)(int status,
-                                       struct orte_process_name_t* peer,
+                                       orte_process_name_t* peer,
                                        struct iovec* msg,
                                        int count,
                                        orte_rml_tag_t tag,
@@ -171,7 +172,7 @@
  * @param[in] cbdata  User data passed to send_buffer_nb() or recv_buffer_nb()
  */
 typedef void (*orte_rml_buffer_callback_fn_t)(int status,
-                                              struct orte_process_name_t* peer,
+                                              orte_process_name_t* peer,
                                               struct opal_buffer_t* buffer,
                                               orte_rml_tag_t tag,
                                               void* cbdata);
@@ -315,7 +316,7 @@
  *                    receiving process is not available
  * @retval ORTE_ERROR  An unspecified error occurred
  */
-typedef int (*orte_rml_module_send_nb_fn_t)(struct orte_process_name_t* peer,
+typedef int (*orte_rml_module_send_nb_fn_t)(orte_process_name_t* peer,
                                             struct iovec* msg,
                                             int count,
                                             orte_rml_tag_t tag,
@@ -345,7 +346,7 @@
  *                    receiving process is not available
  * @retval ORTE_ERROR  An unspecified error occurred
  */
-typedef int (*orte_rml_module_send_buffer_nb_fn_t)(struct orte_process_name_t* 
peer,
+typedef int (*orte_rml_module_send_buffer_nb_fn_t)(orte_process_name_t* peer,
                                                    struct opal_buffer_t* 
buffer,
                                                    orte_rml_tag_t tag,
                                                    
orte_rml_buffer_callback_fn_t cbfunc,
@@ -360,7 +361,7 @@
  * @param[in] cbfunc   Callback function on message comlpetion
  * @param[in] cbdata   User data to provide during completion callback
  */
-typedef void (*orte_rml_module_recv_nb_fn_t)(struct orte_process_name_t* peer,
+typedef void (*orte_rml_module_recv_nb_fn_t)(orte_process_name_t* peer,
                                              orte_rml_tag_t tag,
                                              bool persistent,
                                              orte_rml_callback_fn_t cbfunc,
@@ -376,7 +377,7 @@
  * @param[in] cbfunc   Callback function on message comlpetion
  * @param[in] cbdata   User data to provide during completion callback
  */
-typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(struct 
orte_process_name_t* peer,
+typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(orte_process_name_t* peer,
                                                     orte_rml_tag_t tag,
                                                     bool persistent,
                                                     
orte_rml_buffer_callback_fn_t cbfunc,
@@ -427,7 +428,7 @@
  * to/from a specified process. Used when a process aborts
  * and is to be restarted
  */
-typedef void (*orte_rml_module_purge_fn_t)(struct orte_process_name_t *peer);
+typedef void (*orte_rml_module_purge_fn_t)(orte_process_name_t *peer);

 /* ******************************************************************** */

Index: orte/mca/rml/base/base.h
===================================================================
--- orte/mca/rml/base/base.h    (revision 32440)
+++ orte/mca/rml/base/base.h    (working copy)
@@ -12,6 +12,8 @@
  *                         All rights reserved.
  * Copyright (c) 2007-2014 Los Alamos National Security, LLC.  All rights
  *                         reserved. 
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -245,23 +247,23 @@
 ORTE_DECLSPEC void orte_rml_base_process_error(int fd, short flags, void 
*cbdata);

 /* null functions */
-int orte_rml_base_null_send_nb(struct orte_process_name_t* peer,
+int orte_rml_base_null_send_nb(orte_process_name_t* peer,
                                struct iovec* msg,
                                int count,
                                orte_rml_tag_t tag,
                                orte_rml_callback_fn_t cbfunc,
                                void* cbdata);
-int orte_rml_base_null_send_buffer_nb(struct orte_process_name_t* peer,
+int orte_rml_base_null_send_buffer_nb(orte_process_name_t* peer,
                                       struct opal_buffer_t* buffer,
                                       orte_rml_tag_t tag,
                                       orte_rml_buffer_callback_fn_t cbfunc,
                                       void* cbdata);
-void orte_rml_base_null_recv_nb(struct orte_process_name_t* peer,
+void orte_rml_base_null_recv_nb(orte_process_name_t* peer,
                                 orte_rml_tag_t tag,
                                 bool persistent,
                                 orte_rml_callback_fn_t cbfunc,
                                 void* cbdata);
-void orte_rml_base_null_recv_buffer_nb(struct orte_process_name_t* peer,
+void orte_rml_base_null_recv_buffer_nb(orte_process_name_t* peer,
                                        orte_rml_tag_t tag,
                                        bool persistent,
                                        orte_rml_buffer_callback_fn_t cbfunc,
Index: orte/mca/routed/routed.h
===================================================================
--- orte/mca/routed/routed.h    (revision 32440)
+++ orte/mca/routed/routed.h    (working copy)
@@ -51,7 +51,6 @@


 struct opal_buffer_t;
-struct orte_process_name_t;
 struct orte_rml_module_t;


Index: orte/include/orte/types.h
===================================================================
--- orte/include/orte/types.h   (revision 32440)
+++ orte/include/orte/types.h   (working copy)
@@ -10,6 +10,8 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  *                         All rights reserved.
  * Copyright (c) 2014      Intel, Inc. All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -27,6 +29,7 @@
 #include <sys/types.h>
 #endif
 #include "opal/dss/dss_types.h"
+#include "opal/util/proc.h"

 /**
  * Supported datatypes for messaging and storage operations.
@@ -74,11 +77,11 @@
  * the other, and it will cause problems in the communication subsystems
  */

-typedef uint32_t orte_jobid_t;
+typedef opal_jobid_t orte_jobid_t;
 #define ORTE_JOBID_T        OPAL_UINT32
 #define ORTE_JOBID_MAX      UINT32_MAX-2
 #define ORTE_JOBID_MIN      0
-typedef uint32_t orte_vpid_t;
+typedef opal_vpid_t orte_vpid_t;
 #define ORTE_VPID_T         OPAL_UINT32
 #define ORTE_VPID_MAX       UINT32_MAX-2
 #define ORTE_VPID_MIN       0
@@ -116,11 +119,7 @@
 /*
  * define the process name structure
  */
-struct orte_process_name_t {
-    orte_jobid_t jobid;     /**< Job number */
-    orte_vpid_t vpid;       /**< Process id - equivalent to rank */
-};
-typedef struct orte_process_name_t orte_process_name_t;
+typedef opal_proc_name_t orte_process_name_t;


 /**

Reply via email to