On May 19, 2013, at 11:41 PM, Fields, Christopher J wrote:
> I've been seeing this error popping up quite a bit recently (we're using
> Torque 3.0.5), which is giving a general 'cluster failure' Galaxy error on
> galaxy.jobs.runners.pbs WARNING 2013-05-19 18:56:44,073 (10588) pbs_submit
> failed (try 1/5), PBS error 15033: No free connections
> This just recently started springing up after we had been using the cluster
> for over a year, not sure why it would start acting up now. This appears to
> be related to a bug/defect with pbs_python, which doesn't seem to have been
> fixed yet (I posted a query whether this has been addressed):
> Restarting helps, but are there any other recommended workarounds? Is the
> only solution recompiling Torque with NCONNECTS?
I've always increased NCONNECTS to avoid this. You may also want to decrease
the number of workers for the runners as this should decrease the number of
connections that Galaxy makes.
You could also try the DRMAA runner.
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> To search Galaxy mailing lists use the unified search at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: