On 1/29/07 5:57 PM, "Greg Watson" <gwat...@lanl.gov> wrote:

> Ralph,
> 
> On Jan 29, 2007, at 11:10 AM, Ralph H Castain wrote:
> 
>> 
>> 
>> 
>> On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote:
>> 
>>> 
>>> No, we have always called query() first, just after orte_init().
>>> Since query() has never required a job id before, this used to work.
>>> I think the call was required to kick the SOH into action, but I'm
>>> not sure if it was needed for any other purpose.
>> 
>> Query has nothing to do with the SOH - the only time you would
>> "need" it
>> would be if you are reading a hostfile. Otherwise, it doesn't do
>> anything at
>> the moment.
>> 
>> 
>> Not calling setup_job would be risky, in my opinion...
> 
> We've had this discussion before. We *need* to read the hostfile
> before calling setup_job() because we have to populate the registry
> with node information. If you're saying that this is now no longer
> possible, then I'd respectfully ask that this functionality be
> restored before you release 1.2. If there is some other way to
> achieve this, then please let me know. We've been doing this ever
> since 1.0 and in the alpha and beta versions of 1.2.

I think you don't understand what setup_job does. Setup_job has four
arguments:

(a) an array of app_context objects that contain the application to be
launched

(b) the number of elements in that array

(c) a pointer to a location where the jobid for this job is to be returned;
and

(d) a list of attributes that allows the caller to "fine-tune" behavior

With that info, setup_job will:

(a) create a new jobid for your application; and

(b) store the app_context info in appropriate places in the registry

And that is *all* setup_job does - it simply gets a jobid and initializes
some important info in the registry. It never looks at node information, nor
does it in any way impact node info.

Calling rds.query after rmgr.setup_job is how we always do it. In truth, the
precise ordering of those two operations is immaterial as they have
absolutely nothing in common. However, we always do it in the described
order so that rds.query can have a valid jobid. As I said, at the moment
rds.query doesn't actually use the jobid, though that will change at some
point in the future.

Although it isn't *absolutely* necessary, I would still suggest that you
call rmgr.setup_job before calling rds.query to ensure that any subsequent
operations have all the info they require to function correctly. You can see
the progression we use in orte/mca/rmgr/urm/rmgr_urm.c - I believe you will
find it helpful to follow that logic.

Alternatively, if you want, you can simply repeatedly call orte_rmgr.spawn
and use the attributes I built for you to step your way through the standard
launch. As you probably recall, I gave you the ability to specify - at a
very atomistic level - exactly which steps in the spawn process were to be
implemented at each call into rmgr.spawn. You can look at the referenced
file to see the attribute for each step in the procedure.


> 
>> 
>> 
>>> 
>>> Are there likely to be further API changes before the release
>>> version? We are trying to release PTP, but I think this is impossible
>>> until your API's stabilize.
>> 
>> None planned, other than what I mentioned above. If you want to
>> support Open
>> MPI 1.2, you may need a slight phase shift, though, so you can see
>> the final
>> release.
> 
> Please explain "phase shift".
> 
> Greg
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to