I also support compression at the buffer level, and making it an extra
message.

Talking about compression and flight, has anyone tested using grpc's
compression to compress at the transport level (if that's a correct way to
describe it)? I believe only gzip and brotli are currently supported, so
that might be insufficient.

On Sun, 01 Mar 2020, 23:14 Antoine Pitrou, <anto...@python.org> wrote:

>
> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> > In the context of a "next version of the Feather format" ARROW-5510
> > (which is consumed only by Python and R at the moment), I have been
> > looking at compressing buffers using fast compressors like ZSTD when
> > writing the RecordBatch bodies. This could be handled privately as an
> > implementation detail of the Feather file, but since ZSTD compression
> > could improve throughput in Flight, for example, I thought I would
> > bring it up for discussion.
> >
> > I can see two simple compression strategies:
> >
> > * Compress the entire message body in one-shot, writing the result out
> > with an 8-byte int64 prefix indicating the uncompressed size
> > * Compress each non-zero-length constituent Buffer prior to writing to
> > the body (and using the same uncompressed-length-prefix when writing
> > the compressed buffer)
> >
> > The latter strategy is preferable for scenarios where we may project
> > out only a few fields from a larger record batch (such as reading from
> > a memory-mapped file).
>
> Agreed.  It may also allow using different compression strategies for
> different kinds of buffers (for example a bytestream splitting strategy
> for floats and doubles, or a delta encoding strategy for integers).
>
> > Implementation could be accomplished by one of the following methods:
> >
> > * Setting a field in Message.custom_metadata
> > * Adding a new field to Message
>
> I think it has to be a new field in Message.  Making it an ignorable
> metadata field means non-supporting receivers will decode and interpret
> the data wrongly.
>
> Regards
>
> Antoine.
>

Reply via email to