Dear list,

I have two question for all DRMAA users. Here is the first one.

I was checking how our queuing system (univa GridEngine) and Galaxy react if jobs are submitted that exceed run time or memory limits.

I found out that the python drmaa library cannot query the job status after the job is finished (for both successful and unsuccessful jobs).

In lib/galaxy/jobs/runners/drmaa.py the call gives an exception
    self.ds.job_status( external_job_id )

Is this always the case? Or might this be a problem with our GridEngine?

I have attached some code for testing. Here the first call to s.jobStatus(jobid) works, but the second after s.wait(...) doesn't. But I get "drmaa.errors.InvalidJobException: code 18: The job specified by the 'jobid' does not exist."

The same error pops up in the Galaxy logs. The consequence is that jobs that reached the limits are shown as completed successfully in Galaxy.

Interestingly, quite a bit of information can be obtained from the return value of s.wait. I was wondering if this can be used to differentiate successful from failed jobs. In particular hasExited, hasSignal, and terminateSignal are different in the two cases.

Cheers,
Matthias

#!/usr/bin/env python

from __future__ import print_function
import drmaa
import os


def main():
   """Submit a job.
   Note, need file called sleeper.sh in current directory.
   """
   with drmaa.Session() as s:
       print('Creating job template')
       jt = s.createJobTemplate()
       
       jt.jobName = "foo"
       jt.workingDirectory = "/home/songalax/"
       jt.remoteCommand = '/home/songalax/sleeper.sh'
       jt.args = ['30','Simon_says']
       jt.joinFiles=True
       jt.nativeSpecification = "-l h_rt=10 -l h_vmem=1G -pe smp 2 -w n"
#    
       jobid = s.runJob(jt)
       print('Your job has been submitted with id ' + jobid)

       decodestatus = {drmaa.JobState.UNDETERMINED: 'process status cannot be determined',
                        drmaa.JobState.QUEUED_ACTIVE: 'job is queued and active',
                        drmaa.JobState.SYSTEM_ON_HOLD: 'job is queued and in system hold',
                        drmaa.JobState.USER_ON_HOLD: 'job is queued and in user hold',
                        drmaa.JobState.USER_SYSTEM_ON_HOLD: 'job is queued and in user and system hold',
                        drmaa.JobState.RUNNING: 'job is running',
                        drmaa.JobState.SYSTEM_SUSPENDED: 'job is system suspended',
                        drmaa.JobState.USER_SUSPENDED: 'job is user suspended',
                        drmaa.JobState.DONE: 'job finished normally',
                        drmaa.JobState.FAILED: 'job finished, but failed'}
       print(decodestatus[s.jobStatus(jobid)])

       retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
       print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasExited))
       print("exitStatus {0}\nhasCoreDump {1}\nhasExited {2}\nhasSignal {3}\njobId {4}\nresourceUsage {5}\nterminatedSignal {6}wasAborted {7}\n".format(retval.exitStatus, retval.hasCoreDump, retval.hasExited, retval.hasSignal, retval.jobId, retval.resourceUsage, retval.terminatedSignal, retval.wasAborted))
       print(decodestatus[s.jobStatus(jobid)])


       print('Cleaning up')
       s.deleteJobTemplate(jt)
    
if __name__=='__main__':
     main() 

Attachment: sleeper.sh
Description: application/shellscript

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

Reply via email to