On Feb 11, 2011, at 9:32 AM, Nate Coraor wrote:

> Glen Beane wrote:
>> On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:
>>> I've been doing some testing with a Galaxy instance running on my laptop 
>>> for some tools we are developing.  I am uploading a file into Galaxy from a 
>>> URL to use as test input (~1.5GB tabular) I can download this file to my 
>>> laptop in ~30 seconds with wget,  while if I pull from the same URL into 
>>> Galaxy it takes about 30 minutes.  I set the file type so Galaxy did not 
>>> have to auto-detect.
>>> This seems very slow considering it only takes about 30 seconds to get the 
>>> file over the network and write it to disk. What is Galaxy doing that makes 
>>> this file upload so slow?  We also tried defining our own datatype (data, 
>>> not tabular with the thought that maybe Galaxy tried to examine tabular 
>>> files), but it is still very slow.  In production our input files will grow 
>>> to be much larger than this (although we'll probably abandon tabular for a 
>>> more compact binary format by then).
>> So no insight as to why a 1.5GB file takes 60 times as long to load into 
>> galaxy via URL as it takes to download the file from the same URL outside of 
>> Galaxy?  I'm assuming it has to do with detecting Metadata, since changing 
>> the file type from our custom tabular type to the galaxy tabular type causes 
>> a set metadata job that takes at least 20 minutes (I didn't time it).  
>> However, I changed our data type from tabular to "data" hoping Galaxy would 
>> just ignore the file contents and it still takes 30 minutes to load into 
>> Galaxy.
>> We haven't updated to the latest galaxy-dist (it is on our todo list to 
>> synch up), but this seems like it takes much longer than it should and is a 
>> problem with the implementation
> Hi Glen,
> Sorry, I haven't had a chance to address your question yet.  The reason
> is most likely metadata as you have surmised.  Do you have:
>  set_metadata_externally = True
> Set in universe_wsgi.ini?

I'm not sure. I'll check.  What does this setting do?

> Also, there are some recent changes in the newest dist release which
> limit the number of lines checked for metadata that should make this
> process much faster.

Thanks,  we'll try to update our test Galaxy instance to the newest dist 
releast to see if that helps.

Glen L. Beane
Software Engineer
The Jackson Laboratory
Phone (207) 288-6153

galaxy-dev mailing list

Reply via email to