Ralph and George, here are attached two patches : - heterogeneous.v1.patch : a cleanup of the previous patch - heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made the minimal changes to move jobid and vpid into the OPAL layer.
Cheers, Gilles On 2014/08/07 11:27, Ralph Castain wrote: > Are we maybe approaching this from the wrong direction? I ask because we had > to do some gyrations in the pmix framework to work around the difference in > naming schemes between OPAL and the rest of the code base, and now we have > more gyrations here. > > Given that the MPI and RTE layers both rely on the structured form of the > name, what about if we just mimic that down in OPAL? I think we could perhaps > do this in a way that still allows someone to overlay it with a 64-bit > unstructured identifier if they want, but that would put the extra work on > their side. In other words, we make it easy to work with the other parts of > our own code base, acknowledging that those wanting to do something else may > have to do some extra work. > > I ask because every resource manager out there assigns each process a jobid > and vpid in some form of integer format. So we have to absorb that > information in {jobid, vpid} format regardless of what we may want to do > internally. What we now have to do is immediately convert that into the > unstructured form for OPAL (where we take it in via PMI), then convert it > back to structured form when passing it up to ORTE so it can be handed to > OMPI, and then convert it back to unstructured form every time either OMPI or > ORTE accesses the OPAL layer. > > Seems awfully convoluted and error prone. Simplifying things for ourselves > might make more sense. > > > On Aug 6, 2014, at 1:21 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> Gilles, >> >> This looks right. It is really unfortunately that we have to change the >> definition of orte_process_name_t for big endian architectures, but I don't >> think there is a way around. >> >> Regarding your patch I have two comments: >> 1. There is a flagrant lack of comments ... especially on the ORTE side >> 2. at the OPAL level we are really implementing a htonll, and I really think >> we should stick to the POSIX prototype (aka. returning the changes value >> instead of doing things inplace). >> >> George. >> >> >> >> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> Ralph and George, >> >> here is attached a patch that fixes the heterogeneous support without the >> abstraction violation. >> >> Cheers, >> >> Gilles >> >> >> On 2014/08/06 9:40, Gilles Gouaillardet wrote: >>> hummm >>> >>> i intentionally did not swap the two 32 bits (!) >>> >>> from the top level, what we have is : >>> >>> typedef struct { >>> union { >>> uint64_t opal; >>> struct { >>> uint32_t jobid; >>> uint32_t vpid; >>> } orte; >>> } meta_process_name_t; >>> >>> OPAL is agnostic about jobid and vpid. >>> jobid and vpid are set in ORTE/MPI and OPAL is used only >>> to transport the 64 bits >>> /* opal_process_name_t and orte_process_name_t are often casted into each >>> other */ >>> at ORTE/MPI level, jobid and vpid are set individually >>> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */ >>> this is why everything works fine on homogeneous clusters regardless >>> endianness. >>> >>> now in heterogeneous cluster, thing get a bit trickier ... >>> >>> i was initially unhappy with my commit and i think i found out why : >>> this is an abstraction violation ! >>> the two 32 bits are not swapped by OPAL because this is what is expected by >>> the ORTE/OMPI. >>> >>> now i d like to suggest the following lightweight approach : >>> >>> at OPAL, use #if protected htonll/ntohll >>> (e.g. swap the two 32bits) >>> >>> do the trick at the ORTE level : >>> >>> simply replace >>> >>> struct orte_process_name_t { >>> orte_jobid_t jobid; >>> orte_vpid_t vpid; >>> }; >>> >>> with >>> >>> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) >>> struct orte_process_name_t { >>> orte_vpid_t vpid; >>> orte_jobid_t jobid; >>> }; >>> #else >>> struct orte_process_name_t { >>> orte_jobid_t jobid; >>> orte_vpid_t vpid; >>> }; >>> #endif >>> >>> >>> so we keep OPAL agnostic about how the uint64_t is really used at the upper >>> level. >>> an other option is to make OPAL aware of jobid and vpid but this is a bit >>> more heavyweight imho. >>> >>> i'll try this today and make sure it works. >>> >>> any thoughts ? >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Ah yes, so it is - sorry I missed that last test :-/ >>>> >>>> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>> >>>> The code committed by Gilles is correctly protected for big endian ( >>>> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely >>>> pointing out that I think he should also swap the 2 32 bits in his >>>> implementation. >>>> >>>> George. >>>> >>>> >>>> >>>> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>> >>>>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>>> Hmmm...wouldn't that then require that you know (a) the other side is >>>>>> little endian, and (b) that you are on a big endian? Otherwise, you wind >>>>>> up >>>>>> with the same issue in reverse, yes? >>>>>> >>>>> This is similar to the 32 bits ntohl that we are using in other parts of >>>>> the project. Any little endian participant will do the conversion, while >>>>> every big endian participant will use an empty macro instead. >>>>> >>>>> >>>>>> In the ORTE methods, we explicitly set the fields (e.g., jobid = >>>>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it >>>>>> by >>>>>> location instead of named fields - perhaps we should do that instead? >>>>>> >>>>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the >>>>> notion of jobid and vpid) this approach will become possible. >>>>> >>>>> >>>>> Not proposing that at all so long as the other method will work without >>>>> knowing the other side's endianness. Sounds like your approach should work >>>>> fine as long as Gilles adds a #if so big endian defines the macro away >>>>> >>>>> >>>>> George. >>>>> >>>>> >>>>> >>>>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>>> >>>>>> Technically speaking, converting a 64 bits to a big endian >>>>>> representation requires the swap of the 2 32 bits parts. So the correct >>>>>> approach would have been: >>>>>> uint64_t htonll(uint64_t v) >>>>>> { >>>>>> return ((((uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32)); >>>>>> } >>>>>> >>>>>> George. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> >>>>>>> FWIW: that's exactly how we do it in ORTE >>>>>>> >>>>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet < >>>>>>> gilles.gouaillar...@iferc.org >>>>>>>> wrote: >>>>>>> George, >>>>>>> >>>>>>> i confirm there was a problem when running on an heterogeneous cluster, >>>>>>> this is now fixed in r32425. >>>>>>> >>>>>>> i am not convinced i chose the most elegant way to achieve the desired >>>>>>> result ... >>>>>>> could you please double check this commit ? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> On 2014/08/02 0:14, George Bosilca wrote: >>>>>>> >>>>>>> Gilles, >>>>>>> >>>>>>> The design of the BTL move was to let the opal_process_name_t be >>>>>>> agnostic to what is stored inside, and all accesses should be done >>>>>>> through the provided accessors. Thus, big endian or little endian >>>>>>> doesn't make a difference, as long as everything goes through the >>>>>>> accessors. >>>>>>> >>>>>>> I'm skeptical about the support of heterogeneous environments in the >>>>>>> current code, so I didn't pay much attention to handling the case in >>>>>>> the TCP BTL. But in case we do care it is enough to make the 2 macros >>>>>>> point to something meaningful instead of being empty (bswap_64 or >>>>>>> something). >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet >>>>>>> <gilles.gouaillar...@iferc.org> <gilles.gouaillar...@iferc.org> wrote: >>>>>>> >>>>>>> >>>>>>> George and Ralph, >>>>>>> >>>>>>> i am very confused whether there is an issue or not. >>>>>>> >>>>>>> >>>>>>> anyway, today Paul and i ran basic tests on big endian machines and did >>>>>>> not face any issue related to big endianness. >>>>>>> >>>>>>> so i made my homework, digged into the code, and basically, >>>>>>> opal_process_name_t is used as an orte_process_name_t. >>>>>>> for example, in ompi_proc_init : >>>>>>> >>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->jobid = >>>>>>> OMPI_PROC_MY_NAME->jobid; >>>>>>> OMPI_CAST_ORTE_NAME(&proc->super.proc_name)->vpid = i; >>>>>>> >>>>>>> and with >>>>>>> >>>>>>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a)) >>>>>>> >>>>>>> so as long as an opal_process_name_t is used as an orte_process_name_t, >>>>>>> there is no problem, >>>>>>> regardless the endianness of the homogenous cluster we are running on. >>>>>>> >>>>>>> for the sake of readability (and for being pedantic too ;-) ) in r32357, >>>>>>> &proc_temp->super.proc_name >>>>>>> could be replaced with >>>>>>> OMPI_CAST_ORTE_NAME(&proc_temp->super.proc_name) >>>>>>> >>>>>>> >>>>>>> >>>>>>> That being said, in btl/tcp, i noticed : >>>>>>> >>>>>>> in mca_btl_tcp_component_recv_handler : >>>>>>> >>>>>>> opal_process_name_t guid; >>>>>>> [...] >>>>>>> /* recv the process identifier */ >>>>>>> retval = recv(sd, (char *)&guid, sizeof(guid), 0); >>>>>>> if(retval != sizeof(guid)) { >>>>>>> CLOSE_THE_SOCKET(sd); >>>>>>> return; >>>>>>> } >>>>>>> OPAL_PROCESS_NAME_NTOH(guid); >>>>>>> >>>>>>> and in mca_btl_tcp_endpoint_send_connect_ack : >>>>>>> >>>>>>> /* send process identifier to remote endpoint */ >>>>>>> opal_process_name_t guid = btl_proc->proc_opal->proc_name; >>>>>>> >>>>>>> OPAL_PROCESS_NAME_HTON(guid); >>>>>>> if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid, >>>>>>> sizeof(guid)) != >>>>>>> >>>>>>> and with >>>>>>> >>>>>>> #define OPAL_PROCESS_NAME_NTOH(guid) >>>>>>> #define OPAL_PROCESS_NAME_HTON(guid) >>>>>>> >>>>>>> >>>>>>> i had no time yet to test yet, but for now, i can only suspect : >>>>>>> - there will be an issue with the tcp btl on an heterogeneous cluster >>>>>>> - for this case, the fix is to have a different version of the >>>>>>> OPAL_PROCESS_NAME_xTOy >>>>>>> on little endian arch if heterogeneous mode is supported. >>>>>>> >>>>>>> >>>>>>> >>>>>>> does that make sense ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On 2014/07/31 1:29, George Bosilca wrote: >>>>>>> >>>>>>> The underlying structure changed, so a little bit of fiddling is normal. >>>>>>> Instead of using a field in the ompi_proc_t you are now using a field >>>>>>> down >>>>>>> in opal_proc_t, a field that simply cannot have the same type as before >>>>>>> (orte_process_name_t). >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <r...@open-mpi.org> wrote: >>>>>>> >>>>>>> >>>>>>> George - my point was that we regularly tested using the method in that >>>>>>> routine, and now we have to do something a little different. So it is an >>>>>>> "issue" in that we have to make changes across the code base to ensure >>>>>>> we >>>>>>> do things the "new" way, that's all >>>>>>> >>>>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>> <bosi...@icl.utk.edu> wrote: >>>>>>> >>>>>>> No, this is not going to be an issue if the opal_identifier_t is used >>>>>>> correctly (aka only via the exposed accessors). >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <r...@open-mpi.org> wrote: >>>>>>> >>>>>>> >>>>>>> Yeah, my fix won't work for big endian machines - this is going to be an >>>>>>> issue across the code base now, so we'll have to troll and fix it. I was >>>>>>> doing the minimal change required to fix the trunk in the meantime. >>>>>>> >>>>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>> <bosi...@icl.utk.edu> wrote: >>>>>>> >>>>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64 >>>>>>> bits storage location used by the upper layer to save some local key >>>>>>> that >>>>>>> can be later used to extract information. Calling the OPAL level compare >>>>>>> function might be a better fit there. >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet >>>>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> was it really that simple ? >>>>>>> >>>>>>> proc_temp->super.proc_name has type opal_process_name_t : >>>>>>> typedef opal_identifier_t opal_process_name_t; >>>>>>> typedef uint64_t opal_identifier_t; >>>>>>> >>>>>>> *but* >>>>>>> >>>>>>> item_ptr->peer has type orte_process_name_t : >>>>>>> struct orte_process_name_t { >>>>>>> orte_jobid_t jobid; >>>>>>> orte_vpid_t vpid; >>>>>>> }; >>>>>>> >>>>>>> bottom line, is r32357 still valid on a big endian arch ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> <r...@open-mpi.org> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> I just fixed this one - all that was required was an ampersand as the >>>>>>> name was being passed into the function instead of a pointer to the name >>>>>>> >>>>>>> r32357 >>>>>>> >>>>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET >>>>>>> <gilles.gouaillar...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Rolf, >>>>>>> >>>>>>> r32353 can be seen as a suspect... >>>>>>> Even if it is correct, it might have exposed the bug discussed in #4815 >>>>>>> even more (e.g. we hit the bug 100% after the fix) >>>>>>> >>>>>>> does the attached patch to #4815 fixes the problem ? >>>>>>> >>>>>>> If yes, and if you see this issue as a showstopper, feel free to commit >>>>>>> it and drop a note to #4815 >>>>>>> ( I am afk until tomorrow) >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> Rolf vandeVaart <rvandeva...@nvidia.com> <rvandeva...@nvidia.com> wrote: >>>>>>> >>>>>>> Just an FYI that my trunk version (r32355) does not work at all anymore >>>>>>> if I do not include "--mca coll ^ml". Here is a stack trace from the >>>>>>> ibm/pt2pt/send test running on a single node. >>>>>>> >>>>>>> >>>>>>> >>>>>>> (gdb) where >>>>>>> >>>>>>> #0 0x00007f6c0d1321d0 in ?? () >>>>>>> >>>>>>> #1 <signal handler called> >>>>>>> >>>>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>> ../../orte/util/name_fns.c:522 >>>>>>> >>>>>>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection >>>>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, >>>>>>> peer_list=0x7f6c0c0a6748, >>>>>>> back_files=0x7f6bf3ffd6c8, >>>>>>> >>>>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 >>>>>>> "sm_payload_mem_", map_all=false) at >>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237 >>>>>>> >>>>>>> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti >>>>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, >>>>>>> reg_data=0xba28c0) >>>>>>> >>>>>>> at >>>>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302 >>>>>>> >>>>>>> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510 >>>>>>> >>>>>>> #6 0x00007f6c0cced68f in ml_module_memory_initialization >>>>>>> (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558 >>>>>>> >>>>>>> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539 >>>>>>> >>>>>>> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, >>>>>>> priority=0x7fffe7991b58) at >>>>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963 >>>>>>> >>>>>>> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372 >>>>>>> >>>>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, >>>>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355 >>>>>>> >>>>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, >>>>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90) >>>>>>> >>>>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317 >>>>>>> >>>>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, >>>>>>> comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281 >>>>>>> >>>>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at >>>>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117 >>>>>>> >>>>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, >>>>>>> requested=0, provided=0x7fffe79922e8) at >>>>>>> ../../ompi/runtime/ompi_mpi_init.c:918 >>>>>>> >>>>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, >>>>>>> argv=0x7fffe7992340) at pinit.c:84 >>>>>>> >>>>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at >>>>>>> send.c:32 >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #1 <signal handler called> >>>>>>> >>>>>>> (gdb) up >>>>>>> >>>>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 >>>>>>> '\017', name1=0x192350001, name2=0xbaf76c) at >>>>>>> ../../orte/util/name_fns.c:522 >>>>>>> >>>>>>> 522 if (name1->jobid < name2->jobid) { >>>>>>> >>>>>>> (gdb) print name1 >>>>>>> >>>>>>> $1 = (const orte_process_name_t *) 0x192350001 >>>>>>> >>>>>>> (gdb) print *name1 >>>>>>> >>>>>>> Cannot access memory at address 0x192350001 >>>>>>> >>>>>>> (gdb) print name2 >>>>>>> >>>>>>> $2 = (const orte_process_name_t *) 0xbaf76c >>>>>>> >>>>>>> (gdb) print *name2 >>>>>>> >>>>>>> $3 = {jobid = 2452946945, vpid = 1} >>>>>>> >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: devel [mailto:devel-boun...@open-mpi.org >>>>>>> <devel-boun...@open-mpi.org> >>>>>>> >>>>>>> >>>>>>> >>>>>>> <devel-boun...@open-mpi.org> <devel-boun...@open-mpi.org>] On Behalf Of >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> Gouaillardet >>>>>>> Sent: Wednesday, July 30, 2014 2:16 AM >>>>>>> To: Open MPI Developers >>>>>>> Subject: Re: [OMPI devel] trunk compilation errors in jenkins >>>>>>> George, >>>>>>> #4815 is indirectly related to the move : >>>>>>> in bcol/basesmuma, we used to compare ompi_process_name_t, and now >>>>>>> we (try to) compare an ompi_process_name_t and an opal_process_name_t >>>>>>> (which causes a glory SIGSEGV) >>>>>>> i proposed a temporary patch which is both broken and unelegant, could >>>>>>> >>>>>>> you >>>>>>> >>>>>>> >>>>>>> please advise a correct solution ? >>>>>>> Cheers, >>>>>>> Gilles >>>>>>> On 2014/07/27 7:37, George Bosilca wrote: >>>>>>> >>>>>>> If you have any issue with the move, I'll be happy to help and/or >>>>>>> >>>>>>> support >>>>>>> >>>>>>> >>>>>>> you on your last move toward a completely generic BTL. To facilitate >>>>>>> >>>>>>> your >>>>>>> >>>>>>> >>>>>>> work I exposed a minimalistic set of OMPI information at the OPAL >>>>>>> >>>>>>> level. Take >>>>>>> >>>>>>> >>>>>>> a look at opal/util/proc.h for more info, but please try not to expose >>>>>>> >>>>>>> more. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: http://www.open- >>>>>>> >>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>>>> >>>>>>> mpi.org/community/lists/devel/2014/07/15348.php >>>>>>> >>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> This email message is for the sole use of the intended recipient(s) >>>>>>> and may contain confidential information. Any unauthorized review, use, >>>>>>> disclosure or distribution is prohibited. If you are not the intended >>>>>>> recipient, please contact the sender by reply email and destroy all >>>>>>> copies >>>>>>> of the original message. >>>>>>> ------------------------------ >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15355.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15356.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15363.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15364.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15365.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15366.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15367.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15368.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15446.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing listde...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15454.php >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15509.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15514.php >>>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15518.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15519.php >>>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15520.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15521.php >>>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/08/15523.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/08/15526.php >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/08/15527.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15529.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15530.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15531.php
Index: opal/util/proc.h =================================================================== --- opal/util/proc.h (revision 32440) +++ opal/util/proc.h (working copy) @@ -21,7 +21,7 @@ #include "opal/dss/dss.h" #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT -#include <arpa/inet.h> +#include "opal/types.h" #endif /** @@ -35,22 +35,11 @@ typedef opal_identifier_t opal_process_name_t; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) -#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid)) -static inline __opal_attribute_always_inline__ void -opal_process_name_ntoh_intr(opal_process_name_t *name) -{ - uint32_t * w = (uint32_t *)name; - w[0] = ntohl(w[0]); - w[1] = ntohl(w[1]); -} -#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid)) -static inline __opal_attribute_always_inline__ void -opal_process_name_hton_intr(opal_process_name_t *name) -{ - uint32_t * w = (uint32_t *)name; - w[0] = htonl(w[0]); - w[1] = htonl(w[1]); -} +#define OPAL_PROCESS_NAME_NTOH(guid) \ + guid = ntoh64(guid) + +#define OPAL_PROCESS_NAME_HTON(guid) \ + guid = hton64(guid) #else #define OPAL_PROCESS_NAME_NTOH(guid) #define OPAL_PROCESS_NAME_HTON(guid) Index: orte/include/orte/types.h =================================================================== --- orte/include/orte/types.h (revision 32440) +++ orte/include/orte/types.h (working copy) @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -83,18 +85,18 @@ #define ORTE_VPID_MAX UINT32_MAX-2 #define ORTE_VPID_MIN 0 -#define ORTE_PROCESS_NAME_HTON(n) \ -do { \ - n.jobid = htonl(n.jobid); \ - n.vpid = htonl(n.vpid); \ -} while (0) +#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) +#define ORTE_PROCESS_NAME_HTON(n) \ + OPAL_PROCESS_NAME_HTON(*(opal_process_name_t *)&(n)) -#define ORTE_PROCESS_NAME_NTOH(n) \ -do { \ - n.jobid = ntohl(n.jobid); \ - n.vpid = ntohl(n.vpid); \ -} while (0) +#define ORTE_PROCESS_NAME_NTOH(n) \ + OPAL_PROCESS_NAME_NTOH(*(opal_process_name_t *)&(n)) +#else +#define ORTE_PROCESS_NAME_HTON(n) +#define ORTE_PROCESS_NAME_NTOH(n) +#endif + #define ORTE_NAME_ARGS(n) \ (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : (unsigned long)(n)->jobid), \ (unsigned long) ((NULL == n) ? (unsigned long)ORTE_VPID_INVALID : (unsigned long)(n)->vpid) \ @@ -115,11 +117,23 @@ /* * define the process name structure + * the OPAL layer sees an orte_process_name_t as an opal_process_name_t aka uint64_t + * if heterogeneous is supported, when converting this uint64_t to + * an endian neutral format, vpid and jobid will be swapped. + * consequently, the orte_process_name_t struct must have different definitions + * (swap jobid and vpid) on little and big endian arch. */ +#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) struct orte_process_name_t { + orte_vpid_t vpid; /**< Process id - equivalent to rank */ orte_jobid_t jobid; /**< Job number */ +}; +#else +struct orte_process_name_t { + orte_jobid_t jobid; /**< Job number */ orte_vpid_t vpid; /**< Process id - equivalent to rank */ }; +#endif typedef struct orte_process_name_t orte_process_name_t;
Index: oshmem/mca/scoll/mpi/scoll_mpi_module.c =================================================================== --- oshmem/mca/scoll/mpi/scoll_mpi_module.c (revision 32440) +++ oshmem/mca/scoll/mpi/scoll_mpi_module.c (working copy) @@ -1,11 +1,13 @@ /** - Copyright (c) 2011 Mellanox Technologies. All rights reserved. - Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. - $COPYRIGHT$ - - Additional copyrights may follow - - $HEADER$ + * Copyright (c) 2011 Mellanox Technologies. All rights reserved. + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ */ #include "ompi_config.h" @@ -125,7 +127,7 @@ ompi_proc_t* ompi_proc; for( int j = 0; j < ompi_group_size(parent_group); j++ ) { ompi_proc = ompi_group_peer_lookup(parent_group, j); - if( ompi_proc->super.proc_name == osh_group->proc_array[i]->super.proc_name) { + if( ompi_proc->super.proc_name.id == osh_group->proc_array[i]->super.proc_name.id) { ranks[i] = j; break; } Index: opal/mca/btl/tcp/btl_tcp_proc.c =================================================================== --- opal/mca/btl/tcp/btl_tcp_proc.c (revision 32440) +++ opal/mca/btl/tcp/btl_tcp_proc.c (working copy) @@ -12,6 +12,8 @@ * All rights reserved. * Copyright (c) 2008-2010 Oracle and/or its affiliates. All rights reserved * Copyright (c) 2013 Intel, Inc. All rights reserved + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -77,7 +79,7 @@ /* remove from list of all proc instances */ OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock); opal_hash_table_remove_value_uint64(&mca_btl_tcp_component.tcp_procs, - tcp_proc->proc_opal->proc_name); + tcp_proc->proc_opal->proc_name.id); OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock); /* release resources */ @@ -97,7 +99,7 @@ mca_btl_tcp_proc_t* mca_btl_tcp_proc_create(const opal_proc_t* proc) { - uint64_t hash = proc->proc_name; + uint64_t hash = proc->proc_name.id; mca_btl_tcp_proc_t* btl_proc; size_t size; int rc; @@ -719,7 +721,7 @@ mca_btl_tcp_proc_t* proc = NULL; OPAL_THREAD_LOCK(&mca_btl_tcp_component.tcp_lock); opal_hash_table_get_value_uint64(&mca_btl_tcp_component.tcp_procs, - *name, (void**)&proc); + name->id, (void**)&proc); OPAL_THREAD_UNLOCK(&mca_btl_tcp_component.tcp_lock); return proc; } Index: opal/mca/btl/openib/btl_openib.c =================================================================== --- opal/mca/btl/openib/btl_openib.c (revision 32440) +++ opal/mca/btl/openib/btl_openib.c (working copy) @@ -1064,7 +1064,7 @@ rc = mca_btl_openib_ib_address_add_new( ib_proc->proc_ports[j].pm_port_info.lid, ib_proc->proc_ports[j].pm_port_info.subnet_id, - opal_process_name_jobid(proc->proc_name), endpoint); + proc->proc_name, endpoint); if (OPAL_SUCCESS != rc ) { OPAL_THREAD_UNLOCK(&ib_proc->proc_lock); return OPAL_ERROR; Index: opal/util/proc.c =================================================================== --- opal/util/proc.c (revision 32440) +++ opal/util/proc.c (working copy) @@ -3,6 +3,8 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2013 Inria. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,7 +31,7 @@ static opal_proc_t opal_local_proc = { { .opal_list_next = NULL, .opal_list_prev = NULL}, - 0x1122334455667788, + { .id = 0x1122334455667788}, 0, 0, NULL, @@ -42,13 +44,13 @@ proc->proc_arch = opal_local_arch; proc->proc_convertor = NULL; proc->proc_flags = 0; - proc->proc_name = 0; + proc->proc_name.id = 0; } static void opal_proc_destruct(opal_proc_t* proc) { proc->proc_flags = 0; - proc->proc_name = 0; + proc->proc_name.id = 0; proc->proc_hostname = NULL; proc->proc_convertor = NULL; } @@ -60,8 +62,8 @@ opal_compare_opal_procs(const opal_process_name_t proc1, const opal_process_name_t proc2) { - if( proc1 == proc2 ) return 0; - if( proc1 < proc2 ) return -1; + if( proc1.id == proc2.id ) return 0; + if( proc1.id < proc2.id ) return -1; return 1; } Index: opal/util/proc.h =================================================================== --- opal/util/proc.h (revision 32440) +++ opal/util/proc.h (working copy) @@ -32,25 +32,30 @@ * is to be copied from one structure to another, otherwise it should * only be used via the accessors defined below. */ -typedef opal_identifier_t opal_process_name_t; +typedef uint32_t opal_jobid_t; +typedef uint32_t opal_vpid_t; +typedef struct { + opal_jobid_t jobid; + opal_jobid_t vpid; +} opal_proc_name_t ; +typedef union { + opal_proc_name_t name; + opal_identifier_t id; +} opal_process_name_t; + #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) -#define OPAL_PROCESS_NAME_NTOH(guid) opal_process_name_ntoh_intr(&(guid)) -static inline __opal_attribute_always_inline__ void -opal_process_name_ntoh_intr(opal_process_name_t *name) -{ - uint32_t * w = (uint32_t *)name; - w[0] = ntohl(w[0]); - w[1] = ntohl(w[1]); -} -#define OPAL_PROCESS_NAME_HTON(guid) opal_process_name_hton_intr(&(guid)) -static inline __opal_attribute_always_inline__ void -opal_process_name_hton_intr(opal_process_name_t *name) -{ - uint32_t * w = (uint32_t *)name; - w[0] = htonl(w[0]); - w[1] = htonl(w[1]); -} +#define OPAL_PROCESS_NAME_NTOH(n) \ +do { \ + n.name.jobid = ntohl(n.name.jobid); \ + n.name.vpid = ntohl(n.name.vpid); \ +} while (0); + +#define OPAL_PROCESS_NAME_HTON(n) \ +do { \ + n.name.jobid = htonl(n.name.jobid); \ + n.name.vpid = htonl(n.name.vpid); \ +} while (0); #else #define OPAL_PROCESS_NAME_NTOH(guid) #define OPAL_PROCESS_NAME_HTON(guid) Index: ompi/mca/dpm/orte/dpm_orte.c =================================================================== --- ompi/mca/dpm/orte/dpm_orte.c (revision 32440) +++ ompi/mca/dpm/orte/dpm_orte.c (working copy) @@ -16,6 +16,8 @@ * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -1767,7 +1769,7 @@ } static void paccept_recv(int status, - struct orte_process_name_t* peer, + orte_process_name_t* peer, struct opal_buffer_t* buffer, orte_rml_tag_t tag, void* cbdata) Index: orte/mca/rml/rml.h =================================================================== --- orte/mca/rml/rml.h (revision 32440) +++ orte/mca/rml/rml.h (working copy) @@ -11,6 +11,8 @@ * All rights reserved. * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -52,7 +54,6 @@ struct opal_buffer_t; -struct orte_process_name_t; struct orte_rml_module_t; typedef struct { opal_object_t super; @@ -146,7 +147,7 @@ * @param[in] cbdata User data passed to send_nb() */ typedef void (*orte_rml_callback_fn_t)(int status, - struct orte_process_name_t* peer, + orte_process_name_t* peer, struct iovec* msg, int count, orte_rml_tag_t tag, @@ -171,7 +172,7 @@ * @param[in] cbdata User data passed to send_buffer_nb() or recv_buffer_nb() */ typedef void (*orte_rml_buffer_callback_fn_t)(int status, - struct orte_process_name_t* peer, + orte_process_name_t* peer, struct opal_buffer_t* buffer, orte_rml_tag_t tag, void* cbdata); @@ -315,7 +316,7 @@ * receiving process is not available * @retval ORTE_ERROR An unspecified error occurred */ -typedef int (*orte_rml_module_send_nb_fn_t)(struct orte_process_name_t* peer, +typedef int (*orte_rml_module_send_nb_fn_t)(orte_process_name_t* peer, struct iovec* msg, int count, orte_rml_tag_t tag, @@ -345,7 +346,7 @@ * receiving process is not available * @retval ORTE_ERROR An unspecified error occurred */ -typedef int (*orte_rml_module_send_buffer_nb_fn_t)(struct orte_process_name_t* peer, +typedef int (*orte_rml_module_send_buffer_nb_fn_t)(orte_process_name_t* peer, struct opal_buffer_t* buffer, orte_rml_tag_t tag, orte_rml_buffer_callback_fn_t cbfunc, @@ -360,7 +361,7 @@ * @param[in] cbfunc Callback function on message comlpetion * @param[in] cbdata User data to provide during completion callback */ -typedef void (*orte_rml_module_recv_nb_fn_t)(struct orte_process_name_t* peer, +typedef void (*orte_rml_module_recv_nb_fn_t)(orte_process_name_t* peer, orte_rml_tag_t tag, bool persistent, orte_rml_callback_fn_t cbfunc, @@ -376,7 +377,7 @@ * @param[in] cbfunc Callback function on message comlpetion * @param[in] cbdata User data to provide during completion callback */ -typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(struct orte_process_name_t* peer, +typedef void (*orte_rml_module_recv_buffer_nb_fn_t)(orte_process_name_t* peer, orte_rml_tag_t tag, bool persistent, orte_rml_buffer_callback_fn_t cbfunc, @@ -427,7 +428,7 @@ * to/from a specified process. Used when a process aborts * and is to be restarted */ -typedef void (*orte_rml_module_purge_fn_t)(struct orte_process_name_t *peer); +typedef void (*orte_rml_module_purge_fn_t)(orte_process_name_t *peer); /* ******************************************************************** */ Index: orte/mca/rml/base/base.h =================================================================== --- orte/mca/rml/base/base.h (revision 32440) +++ orte/mca/rml/base/base.h (working copy) @@ -12,6 +12,8 @@ * All rights reserved. * Copyright (c) 2007-2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -245,23 +247,23 @@ ORTE_DECLSPEC void orte_rml_base_process_error(int fd, short flags, void *cbdata); /* null functions */ -int orte_rml_base_null_send_nb(struct orte_process_name_t* peer, +int orte_rml_base_null_send_nb(orte_process_name_t* peer, struct iovec* msg, int count, orte_rml_tag_t tag, orte_rml_callback_fn_t cbfunc, void* cbdata); -int orte_rml_base_null_send_buffer_nb(struct orte_process_name_t* peer, +int orte_rml_base_null_send_buffer_nb(orte_process_name_t* peer, struct opal_buffer_t* buffer, orte_rml_tag_t tag, orte_rml_buffer_callback_fn_t cbfunc, void* cbdata); -void orte_rml_base_null_recv_nb(struct orte_process_name_t* peer, +void orte_rml_base_null_recv_nb(orte_process_name_t* peer, orte_rml_tag_t tag, bool persistent, orte_rml_callback_fn_t cbfunc, void* cbdata); -void orte_rml_base_null_recv_buffer_nb(struct orte_process_name_t* peer, +void orte_rml_base_null_recv_buffer_nb(orte_process_name_t* peer, orte_rml_tag_t tag, bool persistent, orte_rml_buffer_callback_fn_t cbfunc, Index: orte/mca/routed/routed.h =================================================================== --- orte/mca/routed/routed.h (revision 32440) +++ orte/mca/routed/routed.h (working copy) @@ -51,7 +51,6 @@ struct opal_buffer_t; -struct orte_process_name_t; struct orte_rml_module_t; Index: orte/include/orte/types.h =================================================================== --- orte/include/orte/types.h (revision 32440) +++ orte/include/orte/types.h (working copy) @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,6 +29,7 @@ #include <sys/types.h> #endif #include "opal/dss/dss_types.h" +#include "opal/util/proc.h" /** * Supported datatypes for messaging and storage operations. @@ -74,11 +77,11 @@ * the other, and it will cause problems in the communication subsystems */ -typedef uint32_t orte_jobid_t; +typedef opal_jobid_t orte_jobid_t; #define ORTE_JOBID_T OPAL_UINT32 #define ORTE_JOBID_MAX UINT32_MAX-2 #define ORTE_JOBID_MIN 0 -typedef uint32_t orte_vpid_t; +typedef opal_vpid_t orte_vpid_t; #define ORTE_VPID_T OPAL_UINT32 #define ORTE_VPID_MAX UINT32_MAX-2 #define ORTE_VPID_MIN 0 @@ -116,11 +119,7 @@ /* * define the process name structure */ -struct orte_process_name_t { - orte_jobid_t jobid; /**< Job number */ - orte_vpid_t vpid; /**< Process id - equivalent to rank */ -}; -typedef struct orte_process_name_t orte_process_name_t; +typedef opal_proc_name_t orte_process_name_t; /**