On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:

> I've been doing some testing with a Galaxy instance running on my laptop for 
> some tools we are developing.  I am uploading a file into Galaxy from a URL 
> to use as test input (~1.5GB tabular) I can download this file to my laptop 
> in ~30 seconds with wget,  while if I pull from the same URL into Galaxy it 
> takes about 30 minutes.  I set the file type so Galaxy did not have to 
> auto-detect.
> This seems very slow considering it only takes about 30 seconds to get the 
> file over the network and write it to disk. What is Galaxy doing that makes 
> this file upload so slow?  We also tried defining our own datatype (data, not 
> tabular with the thought that maybe Galaxy tried to examine tabular files), 
> but it is still very slow.  In production our input files will grow to be 
> much larger than this (although we'll probably abandon tabular for a more 
> compact binary format by then).

So no insight as to why a 1.5GB file takes 60 times as long to load into galaxy 
via URL as it takes to download the file from the same URL outside of Galaxy?  
I'm assuming it has to do with detecting Metadata, since changing the file type 
from our custom tabular type to the galaxy tabular type causes a set metadata 
job that takes at least 20 minutes (I didn't time it).  However, I changed our 
data type from tabular to "data" hoping Galaxy would just ignore the file 
contents and it still takes 30 minutes to load into Galaxy.

We haven't updated to the latest galaxy-dist (it is on our todo list to synch 
up), but this seems like it takes much longer than it should and is a problem 
with the implementation

Glen L. Beane
Software Engineer
The Jackson Laboratory
Phone (207) 288-6153

galaxy-dev mailing list

Reply via email to