You mention that you moved it to an NFS volume - but it seems you also moved to
a grid configuration using PBS?
If that's the case, what you are seeing might be an issue with NFS attribute
caching or write caching, which causes files created from one machine to not
appear until some time later (from the perspective of other machines). The PBS
job notifications are not impacted by the filesystem latencies.
You can prove this by experiment if you alter the finish_job method in
lib/galaxy/jobs/runners/pbs.py to do a sleep/wait loop, waiting up to 60
seconds for the files to be readable. If that hack works, latency is your
The solution is either to:
- Configure your mounts not to use attribute caching (has performance
- Make the hack permanent.
This happened to us on SGE, which is why I know these details ;-}
Sr. Staff Software Engineer
9885 Towne Centre Drive
San Diego, CA 92121
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Luobin Yang
Sent: Monday, October 17, 2011 10:31 AM
Subject: [galaxy-dev] What's causing this error?
Recently I moved my locally installed Galaxy from a local hard drive to an NFS
mounted hard drive, when I run some tools, I go the following error from the
Job output not returned by PBS: the output datasets were deleted while the job
was running, the job was manually dequeued or there was a cluster error.
I am pretty sure the job was not manually dequeued. Any idea how this happened
and how this can be fixed?
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: