On May 21, 2013, at 2:13 PM, Christopher Fields <cjfie...@illinois.edu>
 wrote:

> On May 20, 2013, at 3:08 PM, Nate Coraor <n...@bx.psu.edu> wrote:
>> ...
>> Hi Chris,
>> 
>> We're disconnecting under all normal conditions and most error conditions - 
>> it looks like only a few conditions would not properly disconnect:
>> 
>> - If pbs_submit() fails 5 times in a row
>> - If an exception is raised anywhere in the queue_job() method after 
>> pbs_connect() is called
>> - If the call to pbs_statjob() in check_all_jobs() or check_single_job() 
>> raises an exception (if that's even possible)
>> 
>> For the exceptions, you would see that an exception was caught in the log 
>> file, so you should be able to determine if this is happening.
>> 
>> For the pbs_submit() case, you'd see the message "All attempts to submit job 
>> failed".
>> 
>> You may want to move the call to pbs_connect() in queue_job() so that it 
>> occurs immediately prior to the call to pbs_submit() and see if that makes a 
>> difference.  The reason we connect so early on is to avoid writing out the 
>> job's files if the PBS server doesn't exist anyway.
> 
> Yes, seeing the pbs_submit() case.  Lowering the # handlers does seem to 
> help, but we still seem to run into this after 

Sorry, saw a unicorn.  Meant, 'we still seem to run into this after a period of 
time on the cluster'.  Coffee…  :P

chris
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to