[galaxy-dev] pass more information on a dataset merge

Jorrit Boekel Mon, 01 Oct 2012 06:25:10 -0700

Dear list,

I thought I was working with fairly large datasets, but they haverecently started to include ~2Gb files in sets of >50. I have ran thesesort of things before as merged data by using tar to roll them up in oneset, but when dealing with >100Gb tarfiles, Galaxy on EC2 seems to getvery slow, although that's probably because of my implementation ofdataset type detection (untar and read through files).

Since tarring/untarring isn't very clean, I want to switch from tarringto creating composite files on merge by putting a tool's results intothe dataset.extra_files_path. This doesn't seem to be supported yet,because we currently pass in do_merge the output dataset.filename to therespective datatype's merge method.

I would like to pass more data to the merge method (let's say the wholedataset object) to be able to get the composite files directory and'merge' the files in there. Good idea, bad idea? If anyone has views onthis, I'd love to hear them.


cheers,
jorrit

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

[galaxy-dev] pass more information on a dataset merge

Reply via email to