On Jul 12, 2012, at 3:03 AM, Geert Vandeweyer wrote:

> Hi,
> 
> Today I ran into a cluster error on our local instance using latest 
> galaxy-dist and torque/pbs with the python-pbs binding.
> 
> 
> Under heavy load of the galaxy process, it appears that the handler processes 
> failed to contact the pbs-server, although the pbs_server was still up and 
> running. after that, a lot of the following statements kept appearing in the 
> handler.log file:
> 
> galaxy.jobs.runners.pbs DEBUG 2012-07-11 17:39:06,649 
> (11647/12788.pbs_master_address) Skipping state check because PBS server 
> connection failed
> 
> After restarting the galaxy process (run.sh), everything worked again, with 
> no changes to the pbs_server.
> 
> Would it be possible to setup some checks for this failure? Like:
> - contact system admin
> - restart galaxy
> - auto retry job submission after a while as to not crash workflows.

Hi Geert,

It'd be useful to retry submission rather than fail.  I doubt we'll get to it 
soon, but would welcome any submissions that did this.  Is restarting Galaxy 
absolutely necessary, or will job submission begin to succeed again after load 
goes down?

--nate

> 
> best regards,
> 
> Geert Vandeweyer
> 
> -- 
> 
> Geert Vandeweyer, Ph.D.
> Department of Medical Genetics
> University of Antwerp
> Prins Boudewijnlaan 43
> 2650 Edegem
> Belgium
> Tel: +32 (0)3 275 97 56
> E-mail: geert.vandewe...@ua.ac.be
> http://ua.ac.be/cognitivegenetics
> http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
> http://lists.bx.psu.edu/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to