Dear list,

I thought I was working with fairly large datasets, but they have recently started to include ~2Gb files in sets of >50. I have ran these sort of things before as merged data by using tar to roll them up in one set, but when dealing with >100Gb tarfiles, Galaxy on EC2 seems to get very slow, although that's probably because of my implementation of dataset type detection (untar and read through files).

Since tarring/untarring isn't very clean, I want to switch from tarring to creating composite files on merge by putting a tool's results into the dataset.extra_files_path. This doesn't seem to be supported yet, because we currently pass in do_merge the output dataset.filename to the respective datatype's merge method.

I would like to pass more data to the merge method (let's say the whole dataset object) to be able to get the composite files directory and 'merge' the files in there. Good idea, bad idea? If anyone has views on this, I'd love to hear them.


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

Reply via email to