job 0 is mpirun and its daemons - I can have it ignore that job as I doubt 
users care :-)

On Jul 25, 2011, at 12:25 PM, Greg Watson wrote:

> Ralph,
> 
> The output format looks good, but I'm not sure it's quite correct. If I run 
> the mpirun command, I see the following:
> 
> mpirun:47520:num nodes:1:num jobs:2
> jobid:0:state:RUNNING:slots:0:num procs:0
> jobid:1:state:RUNNING:slots:1:num procs:4
> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED
> 
> Seems to indicate there are two jobs, but one of them has 0 procs. Is that 
> expected? Not a huge problem, since I can just ignore the job with 0 procs.
> 
> Greg
> 
> 
> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:
> 
>> Okay, you should have it in r24929. Use:
>> 
>> orte-ps --parseable
>> 
>> to get the new output.
>> 
>> 
>> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
>> 
>>> Gar - have to eat my words a bit. The jobid requested by orte-ps is just 
>>> the "local" jobid - i.e., it is expecting you to provide a number from 0-N, 
>>> as I described below (copied here):
>>> 
>>>> A jobid of 1 indicates the primary application, 2 and above would specify 
>>>> comm_spawned jobs. 
>>> 
>>> Not providing the jobid at all corresponds to wildcard and returns the 
>>> status of all jobs under that mpirun.
>>> 
>>> To specify which mpirun you want info on, you use the --pid option. It is 
>>> this option that isn't working properly - orte-ps returns info from all 
>>> mpiruns and doesn't check to provide only data from the given pid.
>>> 
>>> I'll fix that part, and implement the parsable output.
>>> 
>>> 
>>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
>>> 
>>>> 
>>>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
>>>> 
>>>>> Hi Ralph,
>>>>> 
>>>>> I'd like three things :-)
>>>>> 
>>>>> a) A --report-jobid option that prints the jobid on the first line in a 
>>>>> form that can be passed to the -jobid option on ompi-ps. Probably tagging 
>>>>> it in the output if -tag-output is enabled (e.g. jobid:<jobid>) would be 
>>>>> a good idea.
>>>>> 
>>>>> b) The orte-ps command output to use the same jobid format.
>>>> 
>>>> I started looking at the above, and found that orte-ps is just plain wrong 
>>>> in the way it handles jobid. The jobid consists of two fields: a 16-bit 
>>>> number indicating the mpirun, and a 16-bit number indicating the job 
>>>> within that mpirun. Unfortunately, orte-ps sends a data request to every 
>>>> mpirun out there instead of only to the one corresponding to that jobid.
>>>> 
>>>> What we probably should do is have you indicate the mpirun of interest via 
>>>> the -pid option, and then let jobid tell us which job you want within that 
>>>> mpirun. A jobid of 1 indicates the primary application, 2 and above would 
>>>> specify comm_spawned jobs. A jobid of -1 would return the status of all 
>>>> jobs under that mpirun.
>>>> 
>>>> If multiple mpiruns are being reported, then the "jobid" in the report 
>>>> should again be the "local" jobid within that mpirun.
>>>> 
>>>> After all, you don't really care what the orte-internal 16-bit identifier 
>>>> is for that mpirun.
>>>> 
>>>>> 
>>>>> c) A more easily parsable output format from ompi-ps. It doesn't need to 
>>>>> be a full blown XML format, just something like the following would 
>>>>> suffice:
>>>>> 
>>>>> jobid:719585280:state:Running:slots:1:num procs:4
>>>>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
>>>>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
>>>>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
>>>>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
>>>>> jobid:345346663:state:running:slots:1:num procs:2
>>>>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
>>>>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
>>>> 
>>>> Shouldn't be too hard to do - bunch of if-then-else statements required, 
>>>> though.
>>>> 
>>>>> 
>>>>> I'd be happy to help with any or all of these.
>>>> 
>>>> Appreciate the offer - let me see how hard this proves to be...
>>>> 
>>>>> 
>>>>> Cheers,
>>>>> Greg
>>>>> 
>>>>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
>>>>> 
>>>>>> Hmmm...well, it looks like we could have made this nicer than we did :-/
>>>>>> 
>>>>>> If you add --report-uri to the mpirun command line, you'll get back the 
>>>>>> uri for that mpirun. This has the form of <jobid>:<uri>. As the -h 
>>>>>> option indicates:
>>>>>> 
>>>>>> -report-uri | --report-uri <arg0>  
>>>>>>                    Printout URI on stdout [-], stderr [+], or a file
>>>>>>                    [anything else]
>>>>>> 
>>>>>> The "jobid" required by the orte-ps command is the one reported there. 
>>>>>> We could easily add a --report-jobid option if that makes things easier.
>>>>>> 
>>>>>> As to the difference in how orte-ps shows the jobid...well, that's 
>>>>>> probably historical. orte-ps uses an orte utility function to print the 
>>>>>> jobid, and that utility always shows the jobid in component form. Again, 
>>>>>> could add or just use the integer version.
>>>>>> 
>>>>>> 
>>>>>> On Jul 22, 2011, at 7:01 AM, Greg Watson wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Does anyone know if it's possible to get the orte jobid from the mpirun 
>>>>>>> command? If not, how are you supposed to get it to use with orte-ps? 
>>>>>>> Also, orte-ps reports the jobid in [x,y] notation, but the jobid 
>>>>>>> argument seems to be an integer. How does that work?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Greg
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to