On Fri, Jul 29, 2011 at 5:09 PM, Duddy, John <jdu...@illumina.com> wrote:
> We had similar problems on NFS mounts to Isilon. We traced it to
> the default timeout for attribute caching on NFS mounts, which
> does not force a re-read of directory contents (hence file existence
> or size) for up to 30 seconds.
>
> We worked around it by adding no-ac to the mount, but this can
> drastically increase the network traffic to the isilon, so there are
> tradeoffs to be made.
>
> Even when you solve this, nfsv2 does not have open-close write
> consistency, so it is possible for a job to complete on a node and
> Galaxy to try to read the output files while the compute node is
> still flushing its write cache to the file.
>
> All of these scenarios are unlikely on a busy cluster, on which
> job<->Galaxy interactions will likely occur far enough apart in
> time for the caches to clear on their own.
>
> John Duddy

Thanks for your comments John, it's good to know others
have run into similar issues.

You may be right that on a real test load many of these issues
would go away - but at least some of the problems I was seeing
were at start-up or job submission time (and thus prior to the
cluster actually running the job).

We may need to re-organise our network topology, right now
there are probably too many routers/hubs/switches between
the Galaxy server and the cluster and associated storage,
making the mapped drive less responsive than it could be.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to