I really don't have the background to discuss ORTE architectural decisions I will leave this to others.
What I need is the following: The ability to support Shared Memory on CNL. This requires knowing if a proc is local and a mechanism to solve the SM init race condition. Here are the constraints as I see them: - Don't want to #if the code - Need to support this in 1.3 and probably 1.2 depending on release of 1.3 - This shouldn't be a "hack" in 1.3 - 1.2 may be a bit more of a hack as we are talking a back port with a much shorter maintenance time-frame than 1.3 Looks to me like we need a conference call to discuss this. Would sometime next week work? Happy Thanksgiving all! I'm off to eat entirely too much.. - Galen On 11/19/07 10:32 PM, "Ralph Castain" <r...@lanl.gov> wrote: > > > > On 11/19/07 6:20 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: > >> On Monday 19 November 2007 09:42:21 am Ralph H Castain wrote: >> <snip> >>> An alternative solution might be to incorporate the modex in the new OMPI >>> framework I was about to create anyway. This framework was intended to deal >>> with publish/lookup of OMPI data to support a variety of methods. >>> Originally, we had intended only to include support there for things >>> specifically related to MPI_Publish etc., but there is no reason we >>> couldn't generalize it to support the general exchange of process "how to >>> connect to me" info and include a modex API in it. I was figuring we would >>> need two immediate components in it anyway: an ORTE one for when we have >>> full ORTE support in the system, and a CNOS one that would...well, I guess >>> just bark and say "you can't do publish/lookup on a Cray". It would be >>> simple to add the modex stuff there, and makes some logical sense as well. >> I think this approach is fundamentally flawed. Our frameworks are designed to >> abstract out something, to allow for multiple implementations. However, doing >> this would put two completely different things (the modex and the MPI >> pub/sub) together in one framework. While this may be convenient for the >> cray, it would be very inconvenient for someone who wanted to do the MPI >> pub/sub via a ldap server (as has been discussed). The key here is that MPI >> pub/sub is for very small amounts of data, accessed infrequently and in a >> non-performance-critical manner, whereas the modex is for rather large >> amounts of information (on big jobs) that has to be exchanged efficiently. > > Actually, several people talked about this before we proposed it and came to > a different conclusion. The modex is in essence a "here's how to talk to me" > communication, which is the same intent of publish/lookup. I agree that the > volume of data involved is different. However, we are -not- proposing to use > the same mechanism for the two (modex vs. pub/lookup). > > The proposal was based on the fact that the publish/lookup and modex > effectively use similar mechanisms - i.e., the orte component would use the > RML as the underlying communication mechanism. In contrast, the cray > component has alternative non-RML based mechanisms for both systems. > > Things like the LDAP server pose an interesting challenge. In that case, the > publish/lookup cannot use the RML as LDAP has no understanding of that comm > mode. The modex, however, might - and might not - use that mechanism. > Accordingly, the plan was to provide base functions that use RML for any > component that can and wants to do so. This is identical to the approach we > use throughout the code base. > > However, we do need the modex in a framework somewhere as we will need to > modify it to support tight integration with various environments. I cannot > see doing every tight integration with yet another RSL component as the code > duplication gets absurd - there isn't enough difference to support it. I > also, though, don't want to be forced to use the same modex in every case if > the native environment can provide an alternative method - having the modex > in the framework solves that problem. > > So I guess I don't grok the issue here - what is wrong with having a modex > API in the pub/sub framework??? Other than causing you some additional merge > issues within RSL, I fail to understand why this is a problem. > > >> >> Before anyone misunderstands, I am *not* proposing that we add a modex >> framework to ompi. Rather, I think this is a case where the RSL could make >> things really easy. >> >> The RSL defines a process attribute system. One of the original ideas (later >> retracted, but now that I think about it I may re-add it) was to have some >> predefined attribute keys, that the runtime would set so we could look up >> information about any process. >> >> So in the case of the cray, the rsl_init function would query to get all the >> info it wants, and then populate the info into its (local) process attribute >> data store. >> >> In other systems each process would set the information in rsl_init and it >> would be exchanged in the normal modex method. >> >> Then, the information would be looked up (locally) using the 'get' function >> on >> both platforms. >> >> Simple, eh? > > Maybe - and maybe not. The devil is always in the details. My concerns with > the RSL have been documented and wildly misunderstood. I still fail to see > the overall advantage as it seems we get different explanations every time > we ask. But I'll set that aside here. > > FWIW: The publish/lookup interface was specifically required to support both > local and remote data storage operations, though that doesn't really apply > to the modex. > >> >> As an alternative to this, I think we could apply these same ideas into a >> specialized ORTE system, but it would not be as clean, and would tie our >> system closer to ORTE. I am not going to argue whether this is good or bad, >> but I am just mentioning it as a consequence. > > My concern right now is that doing it in RSL means (as we chatted about > offline) integrating RSL into the OMPI trunk NOW - either directly or as > part of the orte revision branch. This will certainly delay getting the ORTE > revision done, maybe by as much as 3 months or more (IMHO). I will contact > LANL management to seek their input on this matter, but I doubt they will be > supportive as such a delay will cause LANL to miss several critical > RoadRunner milestones - which would almost certainly negatively impact our > RoadRunner commercial partners as well. > > Alternatively, I suppose we could just fork the code base at this time, and > I'll complete the orte revisions on a LANL server. I hate to do this, > though, as it means someone (LANL, IBM, Voltaire, some combination, or > whomever) will be left with the problem of dealing with either re-merging > the branches or supporting a split code. I only offer it as an option we > could consider, if necessary. > > Given those potential consequences, it would really help to have some > substantive reason -why- the framework is unacceptable. I grok that you feel > the RSL offers a possibly better alternative, but why does that mean we > shouldn't do the framework now and worry about that if/when the RSL is > proposed for production? > >> >> Tim >> >>> >>> If that makes sense, we can implement the latter approach on the branch >>> where we are doing the next major ORTE revision - that's where I was going >>> to create the new framework anyway. >>> >>> Ralph >>> >>> On 11/16/07 10:24 PM, "Shipman, Galen M." <gship...@ornl.gov> wrote: >>>> I am doing some work on Cray's CNL to support shared memory. To support >>>> shared memory I need to know if processes are local or remote. For other >>>> systems we simply use the modex in ompi_proc_get_info to get the proc's >>>> nodeid. When using CNOS I don't need the modex to get a remote processes >>>> nodeid. In fact, I can obtain every processes pid and nodeid (nid/pid) >>>> via a single CNOS call. >>>> >>>> I have explored a couple of ways to populate the proc structures on the >>>> CRAY. One involves using #if's to do special things in >>>> ompi_proc_get_info. I don't like this. The second method involves adding >>>> a CNOS nameserver and modifying the orte_process_name_t to include the >>>> orte_nodeid_t so that the nameserver can populate all the info if it can. >>>> Prior to this change, the orte_nodeid_t was in ompi_proc_t, which doesn't >>>> make any sense to me, it is an orte level concept and it is only >>>> accessible in the ompi side. I also don't like adding orte_nodeid_t to >>>> orte_process_name_t as it really doesn't have anything to do with the a >>>> name.. I think it makes more sense to have an orte_proc_t.. Something >>>> like the following structure: >>>> >>>> >>>> >>>> struct orte_process_name_t { >>>> orte_jobid_t jobid; /**< Job number */ >>>> orte_vpid_t vpid; /**< Process number */ >>>> /** "nodeid" on which the proc resides */ >>>> }; >>>> >>>> Struct orte_proc_t { >>>> opal_list_item_t super; >>>> orte_process_name_t proc_name; >>>> orte_nodeid_t nid; >>>> }; >>>> >>>> struct ompi_proc_t { >>>> orte_proc_t base; >>>> ..... Etc ..... >>>> >>>> }; >>>> >>>> >>>> I know there is some talk about removing the process names,,, not sure >>>> how that fits in here but this is what makes sense to me given the >>>> current architecture. Any thoughts here? >>>> >>>> >>>> - Galen >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel