Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Wes McKinney Mon, 06 Apr 2020 20:56:25 -0700

I updated the Format proposal again, please have a look

https://github.com/apache/arrow/pull/6707


On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> For uncompressed, memory mapping is disabled, so all of the bytes are
> being read into RAM. I wanted to show that even when your IO pipe is
> very fast (in the case with an NVMe SSD like I have, > 1GB/s for read
> from disk) that you can still load faster with compressed files.
>
> Here were the prior Read results with
>
> * Single threaded decompression
> * Memory mapping enabled
>
> https://ibb.co/4ZncdF8
>
> You can see for larger chunksizes, because the IPC reconstruction
> overhead is about 60 microseconds per batch, that read time is very
> low (10s of milliseconds).
>
> On Wed, Apr 1, 2020 at 10:10 AM Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > The read times are still with memory mapping for the uncompressed case?
> >  If so, impressive!
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 01/04/2020 à 16:44, Wes McKinney a écrit :
> > > Several pieces of work got done in the last few days:
> > >
> > > * Changing from LZ4 raw to LZ4 frame format (what is recommended for
> > > interoperability)
> > > * Parallelizing both compression and decompression at the field level
> > >
> > > Here are the results (using 8 threads on an 8-core laptop). I disabled
> > > the "memory map" feature so that in the uncompressed case all of the
> > > data must be read off disk into memory. This helps illustrate the
> > > compression/IO tradeoff to wall clock load times
> > >
> > > File size (only LZ4 may be different): https://ibb.co/CP3VQkp
> > > Read time: https://ibb.co/vz9JZMx
> > > Write time: https://ibb.co/H7bb68T
> > >
> > > In summary, now with multicore compression and decompression,
> > > LZ4-compressed files are faster both to read and write even on a very
> > > fast SSD, as are ZSTD-compressed files with a low ZSTD compression
> > > level. I didn't notice a major difference between LZ4 raw and LZ4
> > > frame formats. The reads and writes could be made faster still by
> > > pipelining / making concurrent the disk read/write and
> > > compression/decompression steps -- the current implementation performs
> > > these tasks serially. We can improve this in the near future
> > >
> > > I'll update the Format proposal this week so we can move toward
> > > something we can vote on. I would recommend that we await
> > > implementations and integration tests for this before releasing this
> > > as stable, in line with prior discussions about adding stuff to the
> > > IPC protocol
> > >
> > > On Thu, Mar 26, 2020 at 4:57 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >>
> > >> Here are the results:
> > >>
> > >> File size: https://ibb.co/71sBsg3
> > >> Read time: https://ibb.co/4ZncdF8
> > >> Write time: https://ibb.co/xhNkRS2
> > >>
> > >> Code: 
> > >> https://github.com/wesm/notebooks/blob/master/20190919file_benchmarks/FeatherCompression.ipynb
> > >> (based on https://github.com/apache/arrow/pull/6694)
> > >>
> > >> High level summary:
> > >>
> > >> * Chunksize 1024 vs 64K has relatively limited impact on file sizes
> > >>
> > >> * Wall clock read time is impacted by chunksize, maybe 30-40%
> > >> difference between 1K row chunks versus 16K row chunks. One notable
> > >> thing is that you can see clearly the overhead associated with IPC
> > >> reconstruction even when the data is memory mapped. For example, in
> > >> the Fannie Mae dataset there are 21,661 batches (each batch has 31
> > >> fields) when the chunksize is 1024. So a read time of 1.3 seconds
> > >> indicates ~60 microseconds of overhead for each record batch. When you
> > >> consider the amount of business logic involved with reconstructing a
> > >> record batch, 60 microseconds is pretty good. This also shows that
> > >> every microsecond counts and we need to be carefully tracking
> > >> microperformance in this critical operation.
> > >>
> > >> * Small chunksize results in higher write times for "expensive" codecs
> > >> like ZSTD with a high compression ratio. For "cheap" codecs like LZ4
> > >> it doesn't make as much of a difference
> > >>
> > >> * Note that LZ4 compressor results in faster wall clock time to disk
> > >> presumably because the compression speed is faster than my SSD's write
> > >> speed
> > >>
> > >> Implementation notes:
> > >> * There is no parallelization or pipelining of reads or writes. For
> > >> example, on write, all of the buffers are compressed with a single
> > >> thread and then compression stops until the write to disk completes.
> > >> On read, buffers are decompressed serially
> > >>
> > >>
> > >> On Thu, Mar 26, 2020 at 12:24 PM Wes McKinney <wesmck...@gmail.com> 
> > >> wrote:
> > >>>
> > >>> I'll run a grid of batch sizes (from 1024 to 64K or 128K) and let you
> > >>> know the read/write times and compression ratios. Shouldn't take too
> > >>> long
> > >>>
> > >>> On Wed, Mar 25, 2020 at 10:37 PM Fan Liya <liya.fa...@gmail.com> wrote:
> > >>>>
> > >>>> Thanks a lot for sharing the good results.
> > >>>>
> > >>>> As investigated by Wes, we have existing zstd library for Java 
> > >>>> (zstd-jni) [1], and lz4 library for Java (lz4-java) [2].
> > >>>> +1 for the 1024 batch size, as it represents an important scenario 
> > >>>> where the batch fits into the L1 cache (IMO).
> > >>>>
> > >>>> Best,
> > >>>> Liya Fan
> > >>>>
> > >>>> [1] https://github.com/luben/zstd-jni
> > >>>> [2] https://github.com/lz4/lz4-java
> > >>>>
> > >>>> On Thu, Mar 26, 2020 at 2:38 AM Micah Kornfield 
> > >>>> <emkornfi...@gmail.com> wrote:
> > >>>>>
> > >>>>> If it isn't hard could you run with batch sizes of 1024 or 2048 
> > >>>>> records?  I
> > >>>>> think there was a question previously raised if there was benefit for
> > >>>>> smaller sizes buffers.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Micah
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney <wesmck...@gmail.com> 
> > >>>>> wrote:
> > >>>>>
> > >>>>>> On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield 
> > >>>>>> <emkornfi...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on
> > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie 
> > >>>>>>>> Mae
> > >>>>>>>> dataset. So that's a huge space savings
> > >>>>>>>
> > >>>>>>> One more question on this.  What was the average row-batch size 
> > >>>>>>> used?  I
> > >>>>>>> see in the proposal some buffers might not be compressed, did you 
> > >>>>>>> this
> > >>>>>>> feature in the test?
> > >>>>>>
> > >>>>>> I used 64K row batch size. I haven't implemented the optional
> > >>>>>> non-compressed buffers (for cases where there is little space 
> > >>>>>> savings)
> > >>>>>> so everything is compressed. I can check different batch sizes if you
> > >>>>>> like
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Mon, Mar 23, 2020 at 4:40 PM Wes McKinney <wesmck...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> hi folks,
> > >>>>>>>>
> > >>>>>>>> Sorry it's taken me a little while to produce supporting 
> > >>>>>>>> benchmarks.
> > >>>>>>>>
> > >>>>>>>> * I implemented experimental trivial body buffer compression in
> > >>>>>>>> https://github.com/apache/arrow/pull/6638
> > >>>>>>>> * I hooked up the Arrow IPC file format with compression as the new
> > >>>>>>>> Feather V2 format in
> > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476
> > >>>>>>>>
> > >>>>>>>> I tested a couple of real-world datasets from a prior blog post
> > >>>>>>>> https://ursalabs.org/blog/2019-10-columnar-perf/ with ZSTD and LZ4
> > >>>>>>>> codecs
> > >>>>>>>>
> > >>>>>>>> The complete results are here
> > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476
> > >>>>>>>>
> > >>>>>>>> Summary:
> > >>>>>>>>
> > >>>>>>>> * Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD 
> > >>>>>>>> on
> > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie 
> > >>>>>>>> Mae
> > >>>>>>>> dataset. So that's a huge space savings
> > >>>>>>>> * Single-threaded decompression times exceeding 2-4GByte/s with LZ4
> > >>>>>>>> and 1.2-3GByte/s with ZSTD
> > >>>>>>>>
> > >>>>>>>> I would have to do some more engineering to test throughput changes
> > >>>>>>>> with Flight, but given these results on slower networking (e.g. 1
> > >>>>>>>> Gigabit) my guess is that the compression and decompression 
> > >>>>>>>> overhead
> > >>>>>>>> is little compared with the time savings due to high compression
> > >>>>>>>> ratios. If people would like to see these numbers to help make a
> > >>>>>>>> decision I can take a closer look
> > >>>>>>>>
> > >>>>>>>> As far as what Micah said about having a limited number of
> > >>>>>>>> compressors: I would be in favor of having just LZ4 and ZSTD. It 
> > >>>>>>>> seems
> > >>>>>>>> anecdotally that these outperform Snappy in most real world 
> > >>>>>>>> scenarios
> > >>>>>>>> and generally have > 1 GB/s decompression performance. Some Linux
> > >>>>>>>> distributions (Arch at least) have already started adopting ZSTD 
> > >>>>>>>> over
> > >>>>>>>> LZMA or GZIP [1]
> > >>>>>>>>
> > >>>>>>>> - Wes
> > >>>>>>>>
> > >>>>>>>> [1]:
> > >>>>>>>>
> > >>>>>> https://www.archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/
> > >>>>>>>>
> > >>>>>>>> On Fri, Mar 6, 2020 at 8:42 AM Fan Liya <liya.fa...@gmail.com> 
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Wes,
> > >>>>>>>>>
> > >>>>>>>>> Thanks a lot for the additional information.
> > >>>>>>>>> Looking forward to see the good results from your experiments.
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Liya Fan
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Mar 5, 2020 at 11:42 PM Wes McKinney <wesmck...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I see, thank you.
> > >>>>>>>>>>
> > >>>>>>>>>> For such a scenario, implementations would need to define a
> > >>>>>>>>>> "UserDefinedCodec" interface to enable codecs to be registered 
> > >>>>>>>>>> from
> > >>>>>>>>>> third party code, similar to what is done for extension types [1]
> > >>>>>>>>>>
> > >>>>>>>>>> I'll update this thread when I get my experimental C++ patch up 
> > >>>>>>>>>> to
> > >>>>>> see
> > >>>>>>>>>> what I'm thinking at least for the built-in codecs we have like
> > >>>>>> ZSTD.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Mar 5, 2020 at 7:56 AM Fan Liya <liya.fa...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi Wes,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks a lot for your further clarification.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Some of my prelimiary thoughts:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. We assign a unique GUID to each pair of
> > >>>>>> compression/decompression
> > >>>>>>>>>>> strategies. The GUID is stored as part of the
> > >>>>>>>> Message.custom_metadata.
> > >>>>>>>>>> When
> > >>>>>>>>>>> receiving the GUID, the receiver knows which decompression
> > >>>>>> strategy
> > >>>>>>>> to
> > >>>>>>>>>> use.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2. We serialize the decompression strategy, and store it into 
> > >>>>>>>>>>> the
> > >>>>>>>>>>> Message.custom_metadata. The receiver can decompress data after
> > >>>>>>>>>>> deserializing the strategy.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Method 1 is generally used in static strategy scenarios while
> > >>>>>> method
> > >>>>>>>> 2 is
> > >>>>>>>>>>> generally used in dynamic strategy scenarios.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>> Liya Fan
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney <
> > >>>>>> wesmck...@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Okay, I guess my question is how the receiver is going to be
> > >>>>>> able
> > >>>>>>>> to
> > >>>>>>>>>>>> determine how to "rehydrate" the record batch buffers:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> What I've proposed amounts to the following:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> * UNCOMPRESSED: the current behavior
> > >>>>>>>>>>>> * ZSTD/LZ4/...: each buffer is compressed and written with an
> > >>>>>> int64
> > >>>>>>>>>>>> length prefix
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> (I'm close to putting up a PR implementing an experimental
> > >>>>>> version
> > >>>>>>>> of
> > >>>>>>>>>>>> this that uses Message.custom_metadata to transmit the codec,
> > >>>>>> so
> > >>>>>>>> this
> > >>>>>>>>>>>> will make the implementation details more concrete)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> So in the USER_DEFINED case, how will the library know how to
> > >>>>>>>> obtain
> > >>>>>>>>>>>> the uncompressed buffer? Is some additional metadata structure
> > >>>>>>>>>>>> required to provide instructions?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Mar 4, 2020 at 8:05 AM Fan Liya <liya.fa...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Wes,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I am thinking of adding an option named "USER_DEFINED" (or
> > >>>>>>>> something
> > >>>>>>>>>>>>> similar) to enum CompressionType in your proposal.
> > >>>>>>>>>>>>> IMO, this option should be used primarily in Flight.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Liya Fan
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney <
> > >>>>>>>> wesmck...@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Mar 3, 2020, 8:11 PM Fan Liya <
> > >>>>>> liya.fa...@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Sure. I agree with you that we should not overdo this.
> > >>>>>>>>>>>>>>> I am wondering if we should provide an option to allow
> > >>>>>> users
> > >>>>>>>> to
> > >>>>>>>>>>>> plugin
> > >>>>>>>>>>>>>>> their customized compression strategies.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Can you provide a patch showing changes to Message.fbs (or
> > >>>>>>>>>> Schema.fbs)
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>>> make this idea more concrete?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>> Liya Fan
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney <
> > >>>>>>>> wesmck...@gmail.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Tue, Mar 3, 2020, 7:36 AM Fan Liya <
> > >>>>>>>> liya.fa...@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I am so glad to see this discussion, and I am
> > >>>>>> willing to
> > >>>>>>>>>> provide
> > >>>>>>>>>>>> help
> > >>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>> the Java side.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> In the proposal, I see the support for basic
> > >>>>>> compression
> > >>>>>>>>>>>> strategies
> > >>>>>>>>>>>>>>>>> (e.g.gzip, snappy).
> > >>>>>>>>>>>>>>>>> IMO, applying a single basic strategy is not likely
> > >>>>>> to
> > >>>>>>>>>> achieve
> > >>>>>>>>>>>>>>>> performance
> > >>>>>>>>>>>>>>>>> improvement for most scenarios.
> > >>>>>>>>>>>>>>>>> The optimal compression strategy is often obtained by
> > >>>>>>>>>> composing
> > >>>>>>>>>>>> basic
> > >>>>>>>>>>>>>>>>> strategies and tuning parameters.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I hope we can support such highly customized
> > >>>>>> compression
> > >>>>>>>>>>>> strategies.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think very much beyond trivial one-shot buffer level
> > >>>>>>>>>> compression
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>> probably out of the question for addition to the
> > >>>>>> current
> > >>>>>>>>>>>> "RecordBatch"
> > >>>>>>>>>>>>>>>> Flatbuffers type, because the additional metadata
> > >>>>>> would add
> > >>>>>>>>>>>> undesirable
> > >>>>>>>>>>>>>>>> bloat (which I would be against). If people have other
> > >>>>>>>> ideas it
> > >>>>>>>>>>>> would
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> great to see exactly what you are thinking as far as
> > >>>>>>>> changes
> > >>>>>>>>>> to the
> > >>>>>>>>>>>>>>>> protocol files.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I'll try to assemble some examples to show the
> > >>>>>> before/after
> > >>>>>>>>>>>> results of
> > >>>>>>>>>>>>>>>> applying the simple strategy.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Liya Fan
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou <
> > >>>>>>>>>>>> anto...@python.org>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> If we want to use a HTTP header, it would be more
> > >>>>>> of a
> > >>>>>>>>>>>>>>> Accept-Encoding
> > >>>>>>>>>>>>>>>>>> header, no?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> In any case, we would have to put non-standard
> > >>>>>> values
> > >>>>>>>> there
> > >>>>>>>>>>>> (e.g.
> > >>>>>>>>>>>>>>> lz4),
> > >>>>>>>>>>>>>>>>>> so I'm not sure how desirable it is to repurpose
> > >>>>>> HTTP
> > >>>>>>>>>> headers
> > >>>>>>>>>>>> for
> > >>>>>>>>>>>>>>> that,
> > >>>>>>>>>>>>>>>>>> rather than add some dedicated field to the Flight
> > >>>>>>>>>> messages.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Antoine.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Le 03/03/2020 à 12:52, David Li a écrit :
> > >>>>>>>>>>>>>>>>>>> gRPC supports headers so for Flight, we could
> > >>>>>> send
> > >>>>>>>>>>>> essentially an
> > >>>>>>>>>>>>>>>>> Accept
> > >>>>>>>>>>>>>>>>>>> header and perhaps a Content-Type header.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> David
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Mon, Mar 2, 2020, 23:15 Micah Kornfield <
> > >>>>>>>>>>>>>> emkornfi...@gmail.com>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Wes,
> > >>>>>>>>>>>>>>>>>>>> A few thoughts on this.  In general, I think it
> > >>>>>> is a
> > >>>>>>>>>> good
> > >>>>>>>>>>>> idea.
> > >>>>>>>>>>>>>>> But
> > >>>>>>>>>>>>>>>>>> before
> > >>>>>>>>>>>>>>>>>>>> proceeding, I think the following points are
> > >>>>>> worth
> > >>>>>>>>>>>> discussing:
> > >>>>>>>>>>>>>>>>>>>> 1.  Does this actually improve
> > >>>>>> throughput/latency
> > >>>>>>>> for
> > >>>>>>>>>>>> Flight? (I
> > >>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>> mentioned you would follow-up with benchmarks).
> > >>>>>>>>>>>>>>>>>>>> 2.  I think we should limit the number of
> > >>>>>> supported
> > >>>>>>>>>>>> compression
> > >>>>>>>>>>>>>>>>> schemes
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> only 1 or 2.  I think the criteria for selection
> > >>>>>>>> speed
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>> native
> > >>>>>>>>>>>>>>>>>>>> implementations available across the widest
> > >>>>>> possible
> > >>>>>>>>>>>> languages.
> > >>>>>>>>>>>>>>> As
> > >>>>>>>>>>>>>>>>> far
> > >>>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>> i can tell zstd only have bindings in java via
> > >>>>>> JNI,
> > >>>>>>>> but
> > >>>>>>>>>> my
> > >>>>>>>>>>>>>>>>>> understanding is
> > >>>>>>>>>>>>>>>>>>>> it is probably the type of compression for our
> > >>>>>>>>>> use-cases.
> > >>>>>>>>>>>> So I
> > >>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>> zstd + potentially 1 more.
> > >>>>>>>>>>>>>>>>>>>> 3.  Commitment from someone on the Java side to
> > >>>>>>>>>> implement
> > >>>>>>>>>>>> this.
> > >>>>>>>>>>>>>>>>>>>> 4.  This doesn't need to be coupled with this
> > >>>>>> change
> > >>>>>>>>>> per-se
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>> something like flight it would be good to have a
> > >>>>>>>>>> standard
> > >>>>>>>>>>>>>>> mechanism
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>> negotiating server/client capabilities (e.g.
> > >>>>>> client
> > >>>>>>>>>> doesn't
> > >>>>>>>>>>>>>>> support
> > >>>>>>>>>>>>>>>>>>>> compression or only supports a subset).
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>> Micah
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 1:24 PM Wes McKinney <
> > >>>>>>>>>>>>>> wesmck...@gmail.com>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou <
> > >>>>>>>>>>>>>>> anto...@python.org>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Le 01/03/2020 à 22:01, Wes McKinney a écrit :
> > >>>>>>>>>>>>>>>>>>>>>>> In the context of a "next version of the
> > >>>>>> Feather
> > >>>>>>>>>> format"
> > >>>>>>>>>>>>>>>> ARROW-5510
> > >>>>>>>>>>>>>>>>>>>>>>> (which is consumed only by Python and R at
> > >>>>>> the
> > >>>>>>>>>> moment), I
> > >>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>> been
> > >>>>>>>>>>>>>>>>>>>>>>> looking at compressing buffers using fast
> > >>>>>>>> compressors
> > >>>>>>>>>>>> like
> > >>>>>>>>>>>>>> ZSTD
> > >>>>>>>>>>>>>>>>> when
> > >>>>>>>>>>>>>>>>>>>>>>> writing the RecordBatch bodies. This could be
> > >>>>>>>> handled
> > >>>>>>>>>>>>>> privately
> > >>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>> an
> > >>>>>>>>>>>>>>>>>>>>>>> implementation detail of the Feather file,
> > >>>>>> but
> > >>>>>>>> since
> > >>>>>>>>>> ZSTD
> > >>>>>>>>>>>>>>>>> compression
> > >>>>>>>>>>>>>>>>>>>>>>> could improve throughput in Flight, for
> > >>>>>> example,
> > >>>>>>>> I
> > >>>>>>>>>>>> thought I
> > >>>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>>>>>> bring it up for discussion.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> I can see two simple compression strategies:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> * Compress the entire message body in
> > >>>>>> one-shot,
> > >>>>>>>>>> writing
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> result
> > >>>>>>>>>>>>>>>>>>>> out
> > >>>>>>>>>>>>>>>>>>>>>>> with an 8-byte int64 prefix indicating the
> > >>>>>>>>>> uncompressed
> > >>>>>>>>>>>> size
> > >>>>>>>>>>>>>>>>>>>>>>> * Compress each non-zero-length constituent
> > >>>>>>>> Buffer
> > >>>>>>>>>> prior
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> writing
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> the body (and using the same
> > >>>>>>>>>> uncompressed-length-prefix
> > >>>>>>>>>>>> when
> > >>>>>>>>>>>>>>>>> writing
> > >>>>>>>>>>>>>>>>>>>>>>> the compressed buffer)
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> The latter strategy is preferable for
> > >>>>>> scenarios
> > >>>>>>>>>> where we
> > >>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>> project
> > >>>>>>>>>>>>>>>>>>>>>>> out only a few fields from a larger record
> > >>>>>> batch
> > >>>>>>>>>> (such as
> > >>>>>>>>>>>>>>> reading
> > >>>>>>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>>>>>>> a memory-mapped file).
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Agreed.  It may also allow using different
> > >>>>>>>> compression
> > >>>>>>>>>>>>>>> strategies
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>> different kinds of buffers (for example a
> > >>>>>>>> bytestream
> > >>>>>>>>>>>> splitting
> > >>>>>>>>>>>>>>>>>> strategy
> > >>>>>>>>>>>>>>>>>>>>>> for floats and doubles, or a delta encoding
> > >>>>>>>> strategy
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>>> integers).
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> If we wanted to allow for different
> > >>>>>> compression to
> > >>>>>>>>>> apply to
> > >>>>>>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>>>>>> buffers, I think we will need a new Message
> > >>>>>> type
> > >>>>>>>>>> because
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>>>> inflate metadata sizes in a way that is not
> > >>>>>> likely
> > >>>>>>>> to
> > >>>>>>>>>> be
> > >>>>>>>>>>>>>>> acceptable
> > >>>>>>>>>>>>>>>>>>>>> for the current uncompressed use case.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Here is my strawman proposal
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>> https://github.com/apache/arrow/compare/master...wesm:compression-strawman
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Implementation could be accomplished by one
> > >>>>>> of
> > >>>>>>>> the
> > >>>>>>>>>>>> following
> > >>>>>>>>>>>>>>>>> methods:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> * Setting a field in Message.custom_metadata
> > >>>>>>>>>>>>>>>>>>>>>>> * Adding a new field to Message
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I think it has to be a new field in Message.
> > >>>>>>>> Making
> > >>>>>>>>>> it an
> > >>>>>>>>>>>>>>>> ignorable
> > >>>>>>>>>>>>>>>>>>>>>> metadata field means non-supporting receivers
> > >>>>>> will
> > >>>>>>>>>> decode
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> interpret
> > >>>>>>>>>>>>>>>>>>>>>> the data wrongly.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Regards
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Antoine.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

Reply via email to