Sonali Amonkar wrote:
> Hi Nate,
> 
> We are still awaiting any replies to the error on the Torque community. About 
> the debugging, we did try tracejob, however since the job was not getting 
> submitted itself, Torque did not have any logging to the job(it wasn't even a 
> job yet).
> Meanwhile, we are retrying deployment of Galaxy on a different version of 
> Torque(2.3.6) with pbs_python(2.6), but now face a new error,
> 
> galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,345 (34/2519.server) 
> Removed from PBS queue before job completion

This would indicate the job is being stopped either by a user, or the
job walltime or job output size limit configured in universe_wsgi.ini.

--nate

> galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,344 (34/2519.server) PBS 
> job has left queue
> galaxy.jobs.runners.pbs DEBUG 2011-02-25 04:59:18,351 Job output not returned 
> by PBS: the output datasets were deleted while the job was running, the job 
> was manually dequeued or there was a cluster error.
> 
> One certain job gets removed, failing the entire workflow.
> Please let me know if you have any information / if you have come across this 
> error before.
> 
> Many thanks for your time.
> 
> Regards,
> Sonali
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Nate Coraor [mailto:n...@bx.psu.edu] 
> Sent: Tuesday, February 15, 2011 10:30 PM
> To: Sonali Amonkar
> Cc: Galaxy Dev
> Subject: Re: [galaxy-dev] Error with setuptools version in Galaxy 
> installation on Cluster
> 
> Sonali Amonkar wrote:
> > On further digging, we found that the script is failing in the following 
> > part of $GALAXY_HOME/lib/galaxy/jobs/runners/pbs.py:
> > 
> >         # submit
> >         galaxy_job_id = job_wrapper.job_id
> >         log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) )
> >         log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) )
> >         job_id = pbs.pbs_submit(c, job_attrs, job_file, 
> > pbs_queue_name, None)
> 
> This is the line here, it's failing to submit the job.
> 
> >         pbs.pbs_disconnect(c)
> > 
> >         # check to see if it submitted
> >         if not job_id:
> >             errno, text = pbs.error()
> >             log.debug( "(%s) pbs_submit failed, PBS error %d: %s" % 
> > (galaxy_job_id, errno, text) )
> >             job_wrapper.fail( "Unable to run this job due to a cluster 
> > error" )
> >             return
> > 
> > Could this be a problem related to the pbs_python egg (v. pbs_python-4.1.0) 
> > being used by Galaxy or a Torque-specific issue? Just to reiterate, we are 
> > on a development snapshot of Torque which is hard to replace as many other 
> > people using it.
> 
> It's possible that pbs_python is generating code which is incompatible, but 
> since it's linked against your version of TORQUE this should not be the case.
> 
> It's hard to say exactly what's causing this since it's outside of Galaxy.  
> I'm not sure if TORQUE has any client-side debugging that would help with 
> this issue but that's where I'd start.
> 
> > Also, could you please advise which Torque & pbs_python version 
> > combinations have you successfully tested against?
> 
> We're using an older version (2.1.11) on our submission hosts since we saw 
> performance problems when using pbs_python with the newer 2.4.x versions.
> 
> The TORQUE server and execution hosts run 2.4.9.
> 
> > 
> > Regards,
> > Sonali
> > 
> > PS: pbs_python has a new version 4.3 out 
> > (https://subtrac.sara.nl/oss/pbs_python/wiki/TorqueInstallation), why is 
> > this not in the PSU egg repository yet? Would that make a difference?
> 
> I'm not sure if it would make a difference.  I upgrade the pbs_python egg as 
> necessary or when it's particularly far out of date.
> 
> --nate
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
> 
_______________________________________________
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/

Reply via email to