Hi,

What is an alternate way of reaping the exit status and resource
utilization of a finished job in a TORQUE cluster?

as far as I know there are none (maybe some C API ?!). Torque writes a job utilization info directly either into a job accounting log or into (partly) a server log.

However torque has a feature not to immediately remove a completed job from memory, but keep it for some configurable amount of time so that the qstat command can display its utilization, including an exit status. This feature is not a default in torque server and should be configured (option keep_completed)
http://docs.adaptivecomputing.com/torque/6-0-1/help.htm#topics/torque/2-jobs/keepingCompletedJobs.htm?Highlight=keep_completed.

We have that enabled here and I guess most torque sites do the same.

In that case he qstat output looks as follows. Note the char 'C' in the job state (S) column

qstat -a 1097122


            Req'd    Req'd       Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - --------- 1097122 abcd all UWOT_O_Grid_PDE 7071 4 24 48gb 10:00:00 C --

The command qstat -f displays the job exit status

qstat -f 1097122
.....
  etime = Wed Jun  1 22:02:27 2016
    exit_status = 0
    submit_args = abc
....

Best ,
Gizo




Ciao,
R

--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/#Riccardo.Murri

S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)

Tel: +41 44 635 4208
Fax: +41 44 635 6888


--
Dr. Gizo Nanava
Leibniz Universitaet IT Services
Leibniz Universitaet Hannover
Schlosswender Str. 5
D-30159 Hannover
Tel +49 511 762 7919085
http://www.luis.uni-hannover.de


Reply via email to