On Feb 11, 2011, at 9:32 AM, Nate Coraor wrote:

> Glen Beane wrote:
>> 
>> On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:
>> 
>>> I've been doing some testing with a Galaxy instance running on my laptop 
>>> for some tools we are developing.  I am uploading a file into Galaxy from a 
>>> URL to use as test input (~1.5GB tabular) I can download this file to my 
>>> laptop in ~30 seconds with wget,  while if I pull from the same URL into 
>>> Galaxy it takes about 30 minutes.  I set the file type so Galaxy did not 
>>> have to auto-detect.
>>> 
>>> This seems very slow considering it only takes about 30 seconds to get the 
>>> file over the network and write it to disk. What is Galaxy doing that makes 
>>> this file upload so slow?  We also tried defining our own datatype (data, 
>>> not tabular with the thought that maybe Galaxy tried to examine tabular 
>>> files), but it is still very slow.  In production our input files will grow 
>>> to be much larger than this (although we'll probably abandon tabular for a 
>>> more compact binary format by then).
>> 
>> 
>> So no insight as to why a 1.5GB file takes 60 times as long to load into 
>> galaxy via URL as it takes to download the file from the same URL outside of 
>> Galaxy?  I'm assuming it has to do with detecting Metadata, since changing 
>> the file type from our custom tabular type to the galaxy tabular type causes 
>> a set metadata job that takes at least 20 minutes (I didn't time it).  
>> However, I changed our data type from tabular to "data" hoping Galaxy would 
>> just ignore the file contents and it still takes 30 minutes to load into 
>> Galaxy.
>> 
>> We haven't updated to the latest galaxy-dist (it is on our todo list to 
>> synch up), but this seems like it takes much longer than it should and is a 
>> problem with the implementation
> 
> Hi Glen,
> 
> Sorry, I haven't had a chance to address your question yet.  The reason
> is most likely metadata as you have surmised.  Do you have:
> 
>  set_metadata_externally = True
> 
> Set in universe_wsgi.ini?

I'm not sure. I'll check.  What does this setting do?



> Also, there are some recent changes in the newest dist release which
> limit the number of lines checked for metadata that should make this
> process much faster.

Thanks,  we'll try to update our test Galaxy instance to the newest dist 
releast to see if that helps.

--
Glen L. Beane
Software Engineer
The Jackson Laboratory
Phone (207) 288-6153




_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

Reply via email to