Re: [galaxy-dev] job status when SGE kills/aborts job

Shantanu Pavgi Fri, 29 Jul 2011 18:54:30 -0700

On Jul 29, 2011, at 4:13 PM, Ka Ming Nip wrote:

Hi Shantanu,


I am also using a SGE cluster and the DRMAA runner for my Galaxy install. I am 
also having the same issue for jobs that were killed.

How did you define the run-time or memory/runtime configurations in your DRMAA 
URLs?

I had to add "-w n" in the DRMAA URLs in order for my jobs to be dispatched to 
the cluster. However, someone said (on another thread) that doing so might hide 
the errors. I am not sure if this is the cause since my jobs won't be 
dispatched at all if "-w n" was not in the DRMAA URLs.

Ka Ming



The drmaa/SGE URL in our configuration looks something like this:
{{{
drmaa:// -V -m be -M <email.address.for.notification> -l 
vf=<memory>,h_rt=<hard-run-time>,s_rt=<soft-run-time>,h_vmem=<memory> /
}}}

We don't use "-w n" option in our configuration. The "-w n" will turn off 
validation of your job script.  Refer to qsub manual  for details. The -l 
options (complex configuration options) can be found here: 
http://linux.die.net/man/5/sge_complex .

Hope this helps you.

--
Shantanu.


________________________________________
From: 
galaxy-dev-boun...@lists.bx.psu.edu<mailto:galaxy-dev-boun...@lists.bx.psu.edu> 
[galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Shantanu Pavgi 
[pa...@uab.edu]
Sent: July 29, 2011 1:56 PM
To: galaxydev psu
Subject: [galaxy-dev] job status when SGE kills/aborts job

We are using SGE cluster with our galaxy install. We have specified resource 
and run-time limits for certain tools using tool specific drmaa URL 
configuration, e.g.:
- run-time (h_rt, s_rt)
- memory (vf, h_vmem).

This helps scheduler in submitting jobs to an appropriate node and also prevent 
node from crashing because of excessive memory consumption. However, sometimes 
a job needs more resources and/or run-time than specified in the drmaa URL 
configuration. In such cases SGE kills particular job and we get email 
notification with appropriate job summary. However, the galaxy web interface 
doesn't show any error for such failures. The job table doesn't contain any 
related state/info as well. The jobs are shown in green-boxes meaning they 
completed without any failure. In reality these jobs have been killed/aborted 
by the scheduler. This is really confusing as there is inconsistency between 
job status indicated by the galaxy and SGE/drmaa. Has anyone else experienced 
and/or addressed this issue? Any comments or suggestions will be really helpful.

Thanks,
Shantanu.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] job status when SGE kills/aborts job

Reply via email to