Ed,

we had the classic goof on our cluster with this. 4 nodes could not see the 
/home/galaxy folder due to a missing entry in /etc/fstab. When the jobs hit 
those nodes (which explains the randomness) we got the error message.

Bothersome was the lack of good logs to go on. The error message was too 
generic - however I discovered that Galaxy was depositing the error and our 
messages in the /pbs folder and you could briefly read them before they got 
deleted. There the message was the classic SGE input/output message - 
/home/galaxy.... file not found.

Hence my follow up question - how can I have galaxy NOT to delete these SGE 
error and out files?

best,
joe

________________________________
From: Edward Kirton [eskir...@lbl.gov]
Sent: Monday, November 28, 2011 4:15 PM
To: Nate Coraor
Cc: Joseph Hargitai; galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Job output not returned from cluster

hi, we've had this issue too -- in short, the cluster node(s) finish writing 
outfiles to disk, but the file system (inode metadata) isn't updated at the 
galaxy server yet when galaxy checks for the files.

turning the metadata caching off (as recommended on the galaxy wiki) isn't an 
option for me (and the performance hit would be significant), so i added some 
loops around the file checking (5sec sleep and retry up to 6 times).  there 
were a couple of places this probably should be done (not just .[eo]* log files 
but also the outfiles).

i am testing these hacks now but due to the intermittent nature of these 
errors, it'll be a few days before i know if this is working as expected.  once 
vetted, i will put these minor edits in a clone of galaxy-central so the 
changes can be picked up.

ed

On Mon, Oct 24, 2011 at 10:24 AM, Nate Coraor 
<n...@bx.psu.edu<mailto:n...@bx.psu.edu>> wrote:
Joseph Hargitai wrote:
> Nate,
>
> this error is intermittent. You resubmit the same job twice or three time and 
> then it works.  Once we are over the midterm exams - which use galaxy - we 
> will try to switch the filesystem from autofs to hard mount. We suspect this 
> to be the issue.

Ah, I suspect this is attribute caching in NFS.  Try mounting with the
option 'noac' and see if it solves the problem.

> Could we suppress e and o SGE style to resolve this issue, or Galaxy wants 
> the o?

The filename is unimportant, but I doubt it's the cause.

> Do you have an idea about the url build for galaxy - ucsc page return when 
> the url is :8080/galaxy and not just /galaxy?

Not off the top of my head.  I have this message marked, I'll take a
look as soon as I have time.

--nate

>
> thanks,
> joe
>
> ________________________________________
> From: Nate Coraor [n...@bx.psu.edu<mailto:n...@bx.psu.edu>]
> Sent: Friday, October 21, 2011 10:26 AM
> To: Joseph Hargitai
> Cc: galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu>
> Subject: Re: [galaxy-dev] Job output not returned from cluster
>
> Joseph Hargitai wrote:
> >
> > Hi,
> >
> > i was browsing through the list and found many entries for this issue but 
> > not a definite answer.
> >
> > We are actually running into this error for simple file uploads from the 
> > internal filesystem.
>
> Hi Joe,
>
> This error occurs when the job's standard output and error files are not
> found where Galaxy expects them, namely:
>
>     <cluster_files_directory>/<job_id>.o
>     <cluster_files_directory>/<job_id>.e
>
> Please check your queueing system to make sure it can correctly deliver
> these back from the execution hosts to the specified filesystem.
>
> --nate
>
> >
> > thanks,
> > joe
> >
>
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >
> >   http://lists.bx.psu.edu/
>
>
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to