On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> > In the context of a "next version of the Feather format" ARROW-5510
> > (which is consumed only by Python and R at the moment), I have been
> > looking at compressing buffers using fast compressors like ZSTD when
> > writing the RecordBatch bodies. This could be handled privately as an
> > implementation detail of the Feather file, but since ZSTD compression
> > could improve throughput in Flight, for example, I thought I would
> > bring it up for discussion.
> >
> > I can see two simple compression strategies:
> >
> > * Compress the entire message body in one-shot, writing the result out
> > with an 8-byte int64 prefix indicating the uncompressed size
> > * Compress each non-zero-length constituent Buffer prior to writing to
> > the body (and using the same uncompressed-length-prefix when writing
> > the compressed buffer)
> >
> > The latter strategy is preferable for scenarios where we may project
> > out only a few fields from a larger record batch (such as reading from
> > a memory-mapped file).
>
> Agreed.  It may also allow using different compression strategies for
> different kinds of buffers (for example a bytestream splitting strategy
> for floats and doubles, or a delta encoding strategy for integers).

If we wanted to allow for different compression to apply to different
buffers, I think we will need a new Message type because this would
inflate metadata sizes in a way that is not likely to be acceptable
for the current uncompressed use case.

Here is my strawman proposal

https://github.com/apache/arrow/compare/master...wesm:compression-strawman

> > Implementation could be accomplished by one of the following methods:
> >
> > * Setting a field in Message.custom_metadata
> > * Adding a new field to Message
>
> I think it has to be a new field in Message.  Making it an ignorable
> metadata field means non-supporting receivers will decode and interpret
> the data wrongly.
>
> Regards
>
> Antoine.

Reply via email to