On Tue, Sep 6, 2011 at 5:00 PM, Nate Coraor <n...@bx.psu.edu> wrote:
> Peter Cock wrote:
>> On Tue, Sep 6, 2011 at 3:24 PM, Nate Coraor <n...@bx.psu.edu> wrote:
>> > Ideally, there'd just be a column on the dataset table indicating
>> > whether the dataset is compressed or not, and then tools get a new
>> > way to indicate whether they can directly read compressed inputs, or
>> > whether the input needs to be decompressed first.
>> >
>> > --nate
>> Yes, that's what I was envisioning Nate.
>> Are there any schemes other than gzip which would make sense?
>> Perhaps rather than a boolean column (compressed or not), it
>> should specify the kind of compression if any (e.g. gzip).
> Makes sense.
>> We need something which balances compression efficiency (size)
>> with decompression speed, while also being widely supported in
>> libraries for maximum tool uptake.
> Yes, and there's a side effect of allowing this: you may decrease
> efficiency if the tools used downstream all require decompression,
> and you waste a bunch of time decompressing the dataset multiple
> times.

While decompression wastes CPU time and makes things slower,
there is less data IO from disk (which may be network mounted)
which makes things faster. So overall, depending on the setup
and the task at hand, it could be faster.

Is it time to file an issue on bitbucket to track this potential

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to