Dear Galaxy Developers,

I administer a Galaxy instance at Cold Spring Harbor Laboratory, which servers 
around 200 laboratory members.  While our initial hardware purchase has scaled 
well for the last 3 years, we are finding that we can't quite keep up with 
rising the demand for compute-intensive jobs, such as mapping.  We are hesitant 
to consider buying more hardware to support the load, since we can't expect 
that solution to scale.

Rather, we are attempting to set up Galaxy to queue jobs (especially mappers) 
out to the lab's HPCC to accommodate the increasing load.  While there is a 
good number of technical challenges involved in this strategy, I am only 
writing to ask about one: data locality.

Normally, all Galaxy datasets are stored directly on the private server hosting 
our Galaxy instance.  The HPCC cannot mount our Galaxy server's storage (ie: 
for the purpose of running jobs reading/writing datasets) for security reasons. 
 However, we can mount a small portion of the HPCC file system to our Galaxy 
server.  Storage on the HPCC is at a premium, so we can't afford to just let 
newly created (or copied) datasets just sit there.  It follows that we need a 
mechanism for maintaining temporary storage in the (restricted) HPCC space 
which allows for transfer of input datasets to the HPCC (so they will be 
visible to jobs running there) and transfer of output datasets back to 
persistent storage on our server.

I am in the process of analyzing when/where/how exact path names are 
substituted into tool command lines, looking for potential hooks to facilitate 
the staging/unstaging of data before/after job execution on the HPCC.  I have 
found a few places where I might try to insert logic for handling this case.

Before modifying too much of Galaxy's core code, I would like to know if there 
is a recommended method for handling this situation and whether other members 
of the Galaxy community have implemented fixes or workarounds for this or 
similar data locality issues.  If you can offer either type of information, I 
shall be most grateful.  Of course, if the answer were that there were no 
recommended or known technique, then that would be valuable information too.

Thank you in advance,
Eric Paniagua

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to