Hi Nate,

We are still awaiting any replies to the error on the Torque community. About 
the debugging, we did try tracejob, however since the job was not getting 
submitted itself, Torque did not have any logging to the job(it wasn't even a 
job yet).
Meanwhile, we are retrying deployment of Galaxy on a different version of 
Torque(2.3.6) with pbs_python(2.6), but now face a new error,

galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,345 (34/2519.server) Removed 
from PBS queue before job completion
galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,344 (34/2519.server) PBS job 
has left queue
galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,351 Job output not returned 
by PBS: the output datasets were deleted while the job was running, the job was 
manually dequeued or there was a cluster error.

One certain job gets removed, failing the entire workflow.
Please let me know if you have any information / if you have come across this 
error before.

Many thanks for your time.


-----Original Message-----
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Tuesday, February 15, 2011 10:30 PM
To: Sonali Amonkar
Cc: Galaxy Dev
Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy installation 
on Cluster

Sonali Amonkar wrote:
> On further digging, we found that the script is failing in the following part 
> of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py:
>         # submit
>         galaxy_job_id = job_wrapper.job_id
>         log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) )
>         log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) )
>         job_id = pbs.pbs_submit(c, job_attrs, job_file, 
> pbs_queue_name, None)

This is the line here, it's failing to submit the job.

>         pbs.pbs_disconnect(c)
>         # check to see if it submitted
>         if not job_id:
>             errno, text = pbs.error()
>             log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % 
> (galaxy_job_id, errno, text) )
>             job_wrapper.fail( "Unable to run this job due to a cluster error" 
> )
>             return
> Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) 
> being used by Galaxy or a Torque-specific issue? Just to reiterate, we are on 
> a development snapshot of Torque which is hard to replace as many other 
> people using it.

It's possible that pbs_python is generating code which is incompatible, but 
since it's linked against your version of TORQUE this should not be the case.

It's hard to say exactly what's causing this since it's outside of Galaxy.  I'm 
not sure if TORQUE has any client-side debugging that would help with this 
issue but that's where I'd start.

> Also, could you please advise which Torque & pbs_python version combinations 
> have you successfully tested against?

We're using an older version (2.1.11) on our submission hosts since we saw 
performance problems when using pbs_python with the newer 2.4.x versions.

The TORQUE server and execution hosts run 2.4.9.

> Regards,
> Sonali
> PS: pbs_python has a new version 4.3 out 
> (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is this 
> not in the PSU egg repository yet? Would that make a difference?

I'm not sure if it would make a difference.  I upgrade the pbs_python egg as 
necessary or when it's particularly far out of date.


This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:


Reply via email to