Yo Galen

I'm not aware of any continuing discussion to totally remove the process
name from ORTE - I believe we coalesced to redefining how the jobid was
established to a procedure that doesn't require a name server. This hasn't
come over to the trunk yet, but will in the next couple of months.

Adding a field to the process name is an unfortunately non-trivial exercise
as it hits a lot of places, including messing with the headers in the RML
and IOF - and as you know, nobody really wants to mess with that code.

One way to resolve this would be to add your call to get the pid and nodeid
of all procs in your job to the CNOS SDS component since every process has
to call that function anyway. If we go that route, then the question becomes
how best to expose that data to the OMPI layer. Creating an orte_proc_t just
for that purpose seems slightly overkill - can anyone think of another
reason to have such an object?

An alternative solution might be to incorporate the modex in the new OMPI
framework I was about to create anyway. This framework was intended to deal
with publish/lookup of OMPI data to support a variety of methods.
Originally, we had intended only to include support there for things
specifically related to MPI_Publish etc., but there is no reason we couldn't
generalize it to support the general exchange of process "how to connect to
me" info and include a modex API in it. I was figuring we would need two
immediate components in it anyway: an ORTE one for when we have full ORTE
support in the system, and a CNOS one that would...well, I guess just bark
and say "you can't do publish/lookup on a Cray". It would be simple to add
the modex stuff there, and makes some logical sense as well.

If that makes sense, we can implement the latter approach on the branch
where we are doing the next major ORTE revision - that's where I was going
to create the new framework anyway.

Ralph


On 11/16/07 10:24 PM, "Shipman, Galen M." <gship...@ornl.gov> wrote:

> 
> I am doing some work on Cray's CNL to support shared memory. To support
> shared memory I need to know if processes are local or remote. For other
> systems we simply use the modex in ompi_proc_get_info to get the proc's
> nodeid. When using CNOS I don't need the modex to get a remote processes
> nodeid. In fact, I can obtain every processes pid and nodeid (nid/pid) via a
> single CNOS call.
> 
> I have explored a couple of ways to populate the proc structures on the
> CRAY. One involves using #if's to do special things in ompi_proc_get_info. I
> don't like this. The second method involves adding a CNOS nameserver and
> modifying the orte_process_name_t to include the orte_nodeid_t so that the
> nameserver can populate all the info if it can. Prior to this change, the
> orte_nodeid_t was in ompi_proc_t, which doesn't make any sense to me, it is
> an orte level concept and it is only accessible in the ompi side. I also
> don't like adding orte_nodeid_t to orte_process_name_t as it really doesn't
> have anything to do with the a name.. I think it makes more sense to have an
> orte_proc_t.. Something like the following structure:
> 
> 
> 
> struct orte_process_name_t {
>     orte_jobid_t jobid;     /**< Job number */
>     orte_vpid_t vpid;       /**< Process number */
>     /** "nodeid" on which the proc resides */
> };
> 
> Struct orte_proc_t {
>     opal_list_item_t super;
>     orte_process_name_t proc_name;
>     orte_nodeid_t nid;
> };
> 
> struct ompi_proc_t {
>     orte_proc_t base;
>     ..... Etc .....
>      
> };
> 
> 
> I know there is some talk about removing the process names,,, not sure how
> that fits in here but this is what makes sense to me given the current
> architecture. Any thoughts here?
> 
> 
> - Galen 
> 
> 
> 
> 


Reply via email to