Glen Beane wrote:
> 
> On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:
> 
> > I've been doing some testing with a Galaxy instance running on my laptop 
> > for some tools we are developing.  I am uploading a file into Galaxy from a 
> > URL to use as test input (~1.5GB tabular) I can download this file to my 
> > laptop in ~30 seconds with wget,  while if I pull from the same URL into 
> > Galaxy it takes about 30 minutes.  I set the file type so Galaxy did not 
> > have to auto-detect.
> > 
> > This seems very slow considering it only takes about 30 seconds to get the 
> > file over the network and write it to disk. What is Galaxy doing that makes 
> > this file upload so slow?  We also tried defining our own datatype (data, 
> > not tabular with the thought that maybe Galaxy tried to examine tabular 
> > files), but it is still very slow.  In production our input files will grow 
> > to be much larger than this (although we'll probably abandon tabular for a 
> > more compact binary format by then).
> 
> 
> So no insight as to why a 1.5GB file takes 60 times as long to load into 
> galaxy via URL as it takes to download the file from the same URL outside of 
> Galaxy?  I'm assuming it has to do with detecting Metadata, since changing 
> the file type from our custom tabular type to the galaxy tabular type causes 
> a set metadata job that takes at least 20 minutes (I didn't time it).  
> However, I changed our data type from tabular to "data" hoping Galaxy would 
> just ignore the file contents and it still takes 30 minutes to load into 
> Galaxy.
> 
> We haven't updated to the latest galaxy-dist (it is on our todo list to synch 
> up), but this seems like it takes much longer than it should and is a problem 
> with the implementation

Hi Glen,

Sorry, I haven't had a chance to address your question yet.  The reason
is most likely metadata as you have surmised.  Do you have:

  set_metadata_externally = True

Set in universe_wsgi.ini?

Also, there are some recent changes in the newest dist release which
limit the number of lines checked for metadata that should make this
process much faster.

--nate

> 
> 
> --
> Glen L. Beane
> Software Engineer
> The Jackson Laboratory
> Phone (207) 288-6153
> 
> 
> 
> 
> _______________________________________________
> galaxy-dev mailing list
> galaxy-dev@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-dev
_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

Reply via email to