A converted dataset would be fine too.
I'm working on an enhancement that would allow the metadata to be provided when
the file is uploaded/registered via the API. So to do what you say, I'd need to
have a way of providing that converted dataset.
The files I'm talking about are concatenated GZIP files, and the GZIP format
specification doesn't contain any information about the size of the compressed
data, only the uncompressed size (and then, modulo 2^32). AFAIK, anything in
Galaxy that tried to create the auxiliary index would need to read and
decompress all the data in the file to do that - easily an hours' worth of work
for some of our full genome runs. We have all that information already when we
make the file, so I'd prefer to just give it to Galaxy at the start. I could
place stuff in a special section in the first GZIP header, but then this
capability would not be as general-purpose as it could be.
I also want to prevent unnecessary gzip decompression in python, because
serious decompression in versions before 2.7 is so slow as to be unusable for
Is there a way to upload that converted dataset when I upload/register the main
file? I'd also need to know how to write such a file.
Sr. Staff Software Engineer
9885 Towne Centre Drive
San Diego, CA 92121
From: James Taylor [mailto:ja...@jamestaylor.org]
Sent: Friday, August 26, 2011 5:37 AM
To: Duddy, John
Subject: Re: [galaxy-dev] Storing a dict as metadata
Hey John, are you sure you don't want to use a "converted dataset" rather than
a metadata element for this. This is how we handle most types of secondary
indexes for visualization.
If you do it this way, the converter that creates the offset index is just
another tool (but registered in datatypes_conf.xml) and the index it self is
another dataset that can be accessed through the converted datasets
On Aug 25, 2011, at 6:12 PM, Duddy, John wrote:
> I'd like to have a datatype with a dict as metadata. This dict() would store
> file offsets to enable seeking around to process different sections of the
> How do I add a dictionary data metadata element?
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: