You mention that you moved it to an NFS volume - but it seems you also moved to 
a grid configuration using PBS?

If that's the case, what you are seeing might be an issue with NFS attribute 
caching or write caching, which causes files created from one machine to not 
appear until some time later (from the perspective of other machines). The PBS 
job notifications are not impacted by the filesystem latencies.

You can prove this by experiment if you alter the finish_job method in 
lib/galaxy/jobs/runners/pbs.py to do a sleep/wait loop, waiting up to 60 
seconds for the files to be readable. If that hack works, latency is your 
problem.

The solution is either to:

-          Configure your mounts not to use attribute caching (has performance 
impacts), or

-          Make the hack permanent.

This happened to us on SGE, which is why I know these details ;-}

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Luobin Yang
Sent: Monday, October 17, 2011 10:31 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] What's causing this error?

Hi,

Recently I moved my locally installed Galaxy from a local hard drive to an NFS 
mounted hard drive, when I run some tools, I go the following error from the 
log file:

Job output not returned by PBS: the output datasets were deleted while the job 
was running, the job was manually dequeued or there was a cluster error.

I am pretty sure the job was not manually dequeued. Any idea how this happened 
and how this can be fixed?

Thanks,
Luobin

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to