We had similar problems on NFS mounts to Isilon. We traced it to the default 
timeout for attribute caching on NFS mounts, which does not force a re-read of 
directory contents (hence file existence or size) for up to 30 seconds.

We worked around it by adding no-ac to the mount, but this can drastically 
increase the network traffic to the isilon, so there are tradeoffs to be made.

Even when you solve this, nfsv2 does not have open-close write consistency, so 
it is possible for a job to complete on a node and Galaxy to try to read the 
output files while the compute node is still flushing its write cache to the 
file. 

All of these scenarios are unlikely on a busy cluster, on which job<->Galaxy 
interactions will likely occur far enough apart in time for the caches to clear 
on their own.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-----Original Message-----
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Peter Cock
Sent: Friday, July 29, 2011 6:36 AM
To: Galaxy Dev
Subject: [galaxy-dev] Problems with Galaxy on a mapped drive

Hi all,

In my recent email I mentioned problems with our setup and mapped drives. I
am running a test Galaxy on a server under a CIFS mapped drive. If I map the
drive with noperms then things seem to work with submitting jobs to the cluster
etc, but that doesn't seem secure at all. Mounting with strict permissions seems
to cause various network latency related problems in Galaxy though.

Specifically during loading the converters and history export tool,
Galaxy creates
a temporary XML file which it then tries to parse. I was able to resolve this by
switching from tempfile.TemporaryFile to tempfile.mkstemp and adding a 1s
sleep, but it isn't very elegant. Couldn't you use a StringIO handle instead?

Later during start up there were two errors with a similar issue -
Galaxy creates
a temp folder then immediately tries to write a tar ball or zip file
to it. Again,
adding a 1 second sleep after creating the directory before using it seems to
work. See lib/galaxy/web/controllers/dataset.py

After that Galaxy started, but still gives problems - like the issue
reported here
which Galaxy handled badly (see patch):
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-July/006213.html

Here again, inserting a one second sleep between writing the cluster script
file and setting its permissions made it work.

If those are the only issues, that can be dealt with. But are there likely to be
lots more similar problems of this nature later on? That is my worry.

How are most people setting up mapped drives for Galaxy with a cluster?

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to