We routinely put large compressed fastq files into data libraries by that
method (linking, no copy) and it is very fast, since the patch that stopped it
decompressing the files.
You should probably make sure you specify the file format (fastqsanger) so
Galaxy does not attempt to sniff the file to learn its datatype.
Sr. Staff Software Engineer
9885 Towne Centre Drive
San Diego, CA 92121
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jennifer Jackson
Sent: Thursday, September 29, 2011 12:13 PM
To: Roman Valls; Galaxy-Dev
Subject: Re: [galaxy-dev] [galaxy-user] Add library to dataset performance
metric: developer vs production instances
This is a good question for the development community to provide
feedback on, so I'll cross-post your question over to that list.
On 9/19/11 2:30 PM, Roman Valls wrote:
> Today I was routinely adding a 27GB Illumina lane on my galaxy instance
> running on a cluster node. Just the regular cloned-from-hg type of
> instance with set_metadata_externally, no more tuning.
> It took more than 10 minutes to have the dataset imported into a data
> library via the filesystem path upload method... not copying it into
> galaxy, just "linking".
> galaxy.jobs INFO 2011-09-19 18:05:08,641 job 120 dispatched
> galaxy.jobs DEBUG 2011-09-19 18:16:52,822 job 120 ended
> galaxy.datatypes.metadata DEBUG 2011-09-19 18:16:52,824 Cleaning up
> external metadata files
> Since I cannot add datasets to libraries in usegalaxy.org and compare, I
> was wondering if someone can state an approximated average time *for a
> production* galaxy installation to do that operation.
> I would like to have some empirical number to show on how a production
> deployment could speed things up, as opposed to having individual
> galaxy instances per user in a cluster (as per IT policies):
> Thanks in advance !
>  http://usegalaxy.org/production
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org. Please keep all replies on the list by
> using "reply all" in your mail client. For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: