job 0 is mpirun and its daemons - I can have it ignore that job as I doubt users care :-)
On Jul 25, 2011, at 12:25 PM, Greg Watson wrote: > Ralph, > > The output format looks good, but I'm not sure it's quite correct. If I run > the mpirun command, I see the following: > > mpirun:47520:num nodes:1:num jobs:2 > jobid:0:state:RUNNING:slots:0:num procs:0 > jobid:1:state:RUNNING:slots:1:num procs:4 > process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED > process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED > process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED > process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED > > Seems to indicate there are two jobs, but one of them has 0 procs. Is that > expected? Not a huge problem, since I can just ignore the job with 0 procs. > > Greg > > > On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote: > >> Okay, you should have it in r24929. Use: >> >> orte-ps --parseable >> >> to get the new output. >> >> >> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote: >> >>> Gar - have to eat my words a bit. The jobid requested by orte-ps is just >>> the "local" jobid - i.e., it is expecting you to provide a number from 0-N, >>> as I described below (copied here): >>> >>>> A jobid of 1 indicates the primary application, 2 and above would specify >>>> comm_spawned jobs. >>> >>> Not providing the jobid at all corresponds to wildcard and returns the >>> status of all jobs under that mpirun. >>> >>> To specify which mpirun you want info on, you use the --pid option. It is >>> this option that isn't working properly - orte-ps returns info from all >>> mpiruns and doesn't check to provide only data from the given pid. >>> >>> I'll fix that part, and implement the parsable output. >>> >>> >>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote: >>> >>>> >>>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote: >>>> >>>>> Hi Ralph, >>>>> >>>>> I'd like three things :-) >>>>> >>>>> a) A --report-jobid option that prints the jobid on the first line in a >>>>> form that can be passed to the -jobid option on ompi-ps. Probably tagging >>>>> it in the output if -tag-output is enabled (e.g. jobid:<jobid>) would be >>>>> a good idea. >>>>> >>>>> b) The orte-ps command output to use the same jobid format. >>>> >>>> I started looking at the above, and found that orte-ps is just plain wrong >>>> in the way it handles jobid. The jobid consists of two fields: a 16-bit >>>> number indicating the mpirun, and a 16-bit number indicating the job >>>> within that mpirun. Unfortunately, orte-ps sends a data request to every >>>> mpirun out there instead of only to the one corresponding to that jobid. >>>> >>>> What we probably should do is have you indicate the mpirun of interest via >>>> the -pid option, and then let jobid tell us which job you want within that >>>> mpirun. A jobid of 1 indicates the primary application, 2 and above would >>>> specify comm_spawned jobs. A jobid of -1 would return the status of all >>>> jobs under that mpirun. >>>> >>>> If multiple mpiruns are being reported, then the "jobid" in the report >>>> should again be the "local" jobid within that mpirun. >>>> >>>> After all, you don't really care what the orte-internal 16-bit identifier >>>> is for that mpirun. >>>> >>>>> >>>>> c) A more easily parsable output format from ompi-ps. It doesn't need to >>>>> be a full blown XML format, just something like the following would >>>>> suffice: >>>>> >>>>> jobid:719585280:state:Running:slots:1:num procs:4 >>>>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running >>>>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running >>>>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running >>>>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running >>>>> jobid:345346663:state:running:slots:1:num procs:2 >>>>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running >>>>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running >>>> >>>> Shouldn't be too hard to do - bunch of if-then-else statements required, >>>> though. >>>> >>>>> >>>>> I'd be happy to help with any or all of these. >>>> >>>> Appreciate the offer - let me see how hard this proves to be... >>>> >>>>> >>>>> Cheers, >>>>> Greg >>>>> >>>>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote: >>>>> >>>>>> Hmmm...well, it looks like we could have made this nicer than we did :-/ >>>>>> >>>>>> If you add --report-uri to the mpirun command line, you'll get back the >>>>>> uri for that mpirun. This has the form of <jobid>:<uri>. As the -h >>>>>> option indicates: >>>>>> >>>>>> -report-uri | --report-uri <arg0> >>>>>> Printout URI on stdout [-], stderr [+], or a file >>>>>> [anything else] >>>>>> >>>>>> The "jobid" required by the orte-ps command is the one reported there. >>>>>> We could easily add a --report-jobid option if that makes things easier. >>>>>> >>>>>> As to the difference in how orte-ps shows the jobid...well, that's >>>>>> probably historical. orte-ps uses an orte utility function to print the >>>>>> jobid, and that utility always shows the jobid in component form. Again, >>>>>> could add or just use the integer version. >>>>>> >>>>>> >>>>>> On Jul 22, 2011, at 7:01 AM, Greg Watson wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Does anyone know if it's possible to get the orte jobid from the mpirun >>>>>>> command? If not, how are you supposed to get it to use with orte-ps? >>>>>>> Also, orte-ps reports the jobid in [x,y] notation, but the jobid >>>>>>> argument seems to be an integer. How does that work? >>>>>>> >>>>>>> Thanks, >>>>>>> Greg >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> [email protected] >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> [email protected] >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> [email protected] >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >> >> >> _______________________________________________ >> devel mailing list >> [email protected] >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/devel
