i believe the latest stable update of galaxy included changes to drmaa.py
which allows a job to be rechecked indefinitely with regard to scheduler
communication errors, so perhaps your "cluster could not complete job"
errors are due to a filesystem race condition, whereby the cluster node
completes the job but the inode metadata table updates haven't propagated
completely so the files appear to be missing to the job runner, on a
different server.  in this case, the config variable you want to increase
is the new "retry_job_output_collection", also part of the last update to
stable.

On Wed, Feb 22, 2012 at 5:52 AM, Aurélien Bernard <
aurelien.bern...@univ-montp2.fr> wrote:

> Hello everybody :)
>
>
> Today, I have a question related to timeout management  in Galaxy.
>
> More particularly, I'm searching for a way to set (in a configuration file
> if possible) all timeouts related to DRMAA and timeouts related to
> communication between Galaxy and SGE.
>
>
> My goal is to increase current timeouts to avoid the "Cluster could not
> complete job" error on successful jobs when there is a temporary problem of
> "job status checking" (due to heavy write load on the hard drive or
> whatever).
>
>
> Is this possible ?
>
>
> Thank you in advance,
>
> Have a nice day
>
> A. Bernard
>
> --
> Aurélien Bernard
> IE Bioprogrammeur - CNRS
> Université des sciences Montpellier II
> Institut des Sciences de l'Evolution
> France
>
> ______________________________**_____________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to