Glen Beane wrote: > > On Feb 9, 2011, at 9:44 AM, Glen Beane wrote: > > > I've been doing some testing with a Galaxy instance running on my laptop > > for some tools we are developing. I am uploading a file into Galaxy from a > > URL to use as test input (~1.5GB tabular) I can download this file to my > > laptop in ~30 seconds with wget, while if I pull from the same URL into > > Galaxy it takes about 30 minutes. I set the file type so Galaxy did not > > have to auto-detect. > > > > This seems very slow considering it only takes about 30 seconds to get the > > file over the network and write it to disk. What is Galaxy doing that makes > > this file upload so slow? We also tried defining our own datatype (data, > > not tabular with the thought that maybe Galaxy tried to examine tabular > > files), but it is still very slow. In production our input files will grow > > to be much larger than this (although we'll probably abandon tabular for a > > more compact binary format by then). > > > So no insight as to why a 1.5GB file takes 60 times as long to load into > galaxy via URL as it takes to download the file from the same URL outside of > Galaxy? I'm assuming it has to do with detecting Metadata, since changing > the file type from our custom tabular type to the galaxy tabular type causes > a set metadata job that takes at least 20 minutes (I didn't time it). > However, I changed our data type from tabular to "data" hoping Galaxy would > just ignore the file contents and it still takes 30 minutes to load into > Galaxy. > > We haven't updated to the latest galaxy-dist (it is on our todo list to synch > up), but this seems like it takes much longer than it should and is a problem > with the implementation
Hi Glen, Sorry, I haven't had a chance to address your question yet. The reason is most likely metadata as you have surmised. Do you have: set_metadata_externally = True Set in universe_wsgi.ini? Also, there are some recent changes in the newest dist release which limit the number of lines checked for metadata that should make this process much faster. --nate > > > -- > Glen L. Beane > Software Engineer > The Jackson Laboratory > Phone (207) 288-6153 > > > > > _______________________________________________ > galaxy-dev mailing list > galaxy-dev@lists.bx.psu.edu > http://lists.bx.psu.edu/listinfo/galaxy-dev _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev