Glen Beane wrote:
> On Feb 9, 2011, at 9:44 AM, Glen Beane wrote:
> > I've been doing some testing with a Galaxy instance running on my laptop
> > for some tools we are developing. I am uploading a file into Galaxy from a
> > URL to use as test input (~1.5GB tabular) I can download this file to my
> > laptop in ~30 seconds with wget, while if I pull from the same URL into
> > Galaxy it takes about 30 minutes. I set the file type so Galaxy did not
> > have to auto-detect.
> > This seems very slow considering it only takes about 30 seconds to get the
> > file over the network and write it to disk. What is Galaxy doing that makes
> > this file upload so slow? We also tried defining our own datatype (data,
> > not tabular with the thought that maybe Galaxy tried to examine tabular
> > files), but it is still very slow. In production our input files will grow
> > to be much larger than this (although we'll probably abandon tabular for a
> > more compact binary format by then).
> So no insight as to why a 1.5GB file takes 60 times as long to load into
> galaxy via URL as it takes to download the file from the same URL outside of
> Galaxy? I'm assuming it has to do with detecting Metadata, since changing
> the file type from our custom tabular type to the galaxy tabular type causes
> a set metadata job that takes at least 20 minutes (I didn't time it).
> However, I changed our data type from tabular to "data" hoping Galaxy would
> just ignore the file contents and it still takes 30 minutes to load into
> We haven't updated to the latest galaxy-dist (it is on our todo list to synch
> up), but this seems like it takes much longer than it should and is a problem
> with the implementation
Sorry, I haven't had a chance to address your question yet. The reason
is most likely metadata as you have surmised. Do you have:
set_metadata_externally = True
Set in universe_wsgi.ini?
Also, there are some recent changes in the newest dist release which
limit the number of lines checked for metadata that should make this
process much faster.
> Glen L. Beane
> Software Engineer
> The Jackson Laboratory
> Phone (207) 288-6153
> galaxy-dev mailing list
galaxy-dev mailing list