I updated the Format proposal again, please have a look https://github.com/apache/arrow/pull/6707
On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney <wesmck...@gmail.com> wrote: > > For uncompressed, memory mapping is disabled, so all of the bytes are > being read into RAM. I wanted to show that even when your IO pipe is > very fast (in the case with an NVMe SSD like I have, > 1GB/s for read > from disk) that you can still load faster with compressed files. > > Here were the prior Read results with > > * Single threaded decompression > * Memory mapping enabled > > https://ibb.co/4ZncdF8 > > You can see for larger chunksizes, because the IPC reconstruction > overhead is about 60 microseconds per batch, that read time is very > low (10s of milliseconds). > > On Wed, Apr 1, 2020 at 10:10 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > The read times are still with memory mapping for the uncompressed case? > > If so, impressive! > > > > Regards > > > > Antoine. > > > > > > Le 01/04/2020 à 16:44, Wes McKinney a écrit : > > > Several pieces of work got done in the last few days: > > > > > > * Changing from LZ4 raw to LZ4 frame format (what is recommended for > > > interoperability) > > > * Parallelizing both compression and decompression at the field level > > > > > > Here are the results (using 8 threads on an 8-core laptop). I disabled > > > the "memory map" feature so that in the uncompressed case all of the > > > data must be read off disk into memory. This helps illustrate the > > > compression/IO tradeoff to wall clock load times > > > > > > File size (only LZ4 may be different): https://ibb.co/CP3VQkp > > > Read time: https://ibb.co/vz9JZMx > > > Write time: https://ibb.co/H7bb68T > > > > > > In summary, now with multicore compression and decompression, > > > LZ4-compressed files are faster both to read and write even on a very > > > fast SSD, as are ZSTD-compressed files with a low ZSTD compression > > > level. I didn't notice a major difference between LZ4 raw and LZ4 > > > frame formats. The reads and writes could be made faster still by > > > pipelining / making concurrent the disk read/write and > > > compression/decompression steps -- the current implementation performs > > > these tasks serially. We can improve this in the near future > > > > > > I'll update the Format proposal this week so we can move toward > > > something we can vote on. I would recommend that we await > > > implementations and integration tests for this before releasing this > > > as stable, in line with prior discussions about adding stuff to the > > > IPC protocol > > > > > > On Thu, Mar 26, 2020 at 4:57 PM Wes McKinney <wesmck...@gmail.com> wrote: > > >> > > >> Here are the results: > > >> > > >> File size: https://ibb.co/71sBsg3 > > >> Read time: https://ibb.co/4ZncdF8 > > >> Write time: https://ibb.co/xhNkRS2 > > >> > > >> Code: > > >> https://github.com/wesm/notebooks/blob/master/20190919file_benchmarks/FeatherCompression.ipynb > > >> (based on https://github.com/apache/arrow/pull/6694) > > >> > > >> High level summary: > > >> > > >> * Chunksize 1024 vs 64K has relatively limited impact on file sizes > > >> > > >> * Wall clock read time is impacted by chunksize, maybe 30-40% > > >> difference between 1K row chunks versus 16K row chunks. One notable > > >> thing is that you can see clearly the overhead associated with IPC > > >> reconstruction even when the data is memory mapped. For example, in > > >> the Fannie Mae dataset there are 21,661 batches (each batch has 31 > > >> fields) when the chunksize is 1024. So a read time of 1.3 seconds > > >> indicates ~60 microseconds of overhead for each record batch. When you > > >> consider the amount of business logic involved with reconstructing a > > >> record batch, 60 microseconds is pretty good. This also shows that > > >> every microsecond counts and we need to be carefully tracking > > >> microperformance in this critical operation. > > >> > > >> * Small chunksize results in higher write times for "expensive" codecs > > >> like ZSTD with a high compression ratio. For "cheap" codecs like LZ4 > > >> it doesn't make as much of a difference > > >> > > >> * Note that LZ4 compressor results in faster wall clock time to disk > > >> presumably because the compression speed is faster than my SSD's write > > >> speed > > >> > > >> Implementation notes: > > >> * There is no parallelization or pipelining of reads or writes. For > > >> example, on write, all of the buffers are compressed with a single > > >> thread and then compression stops until the write to disk completes. > > >> On read, buffers are decompressed serially > > >> > > >> > > >> On Thu, Mar 26, 2020 at 12:24 PM Wes McKinney <wesmck...@gmail.com> > > >> wrote: > > >>> > > >>> I'll run a grid of batch sizes (from 1024 to 64K or 128K) and let you > > >>> know the read/write times and compression ratios. Shouldn't take too > > >>> long > > >>> > > >>> On Wed, Mar 25, 2020 at 10:37 PM Fan Liya <liya.fa...@gmail.com> wrote: > > >>>> > > >>>> Thanks a lot for sharing the good results. > > >>>> > > >>>> As investigated by Wes, we have existing zstd library for Java > > >>>> (zstd-jni) [1], and lz4 library for Java (lz4-java) [2]. > > >>>> +1 for the 1024 batch size, as it represents an important scenario > > >>>> where the batch fits into the L1 cache (IMO). > > >>>> > > >>>> Best, > > >>>> Liya Fan > > >>>> > > >>>> [1] https://github.com/luben/zstd-jni > > >>>> [2] https://github.com/lz4/lz4-java > > >>>> > > >>>> On Thu, Mar 26, 2020 at 2:38 AM Micah Kornfield > > >>>> <emkornfi...@gmail.com> wrote: > > >>>>> > > >>>>> If it isn't hard could you run with batch sizes of 1024 or 2048 > > >>>>> records? I > > >>>>> think there was a question previously raised if there was benefit for > > >>>>> smaller sizes buffers. > > >>>>> > > >>>>> Thanks, > > >>>>> Micah > > >>>>> > > >>>>> > > >>>>> On Wed, Mar 25, 2020 at 8:59 AM Wes McKinney <wesmck...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> On Tue, Mar 24, 2020 at 9:22 PM Micah Kornfield > > >>>>>> <emkornfi...@gmail.com> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie > > >>>>>>>> Mae > > >>>>>>>> dataset. So that's a huge space savings > > >>>>>>> > > >>>>>>> One more question on this. What was the average row-batch size > > >>>>>>> used? I > > >>>>>>> see in the proposal some buffers might not be compressed, did you > > >>>>>>> this > > >>>>>>> feature in the test? > > >>>>>> > > >>>>>> I used 64K row batch size. I haven't implemented the optional > > >>>>>> non-compressed buffers (for cases where there is little space > > >>>>>> savings) > > >>>>>> so everything is compressed. I can check different batch sizes if you > > >>>>>> like > > >>>>>> > > >>>>>> > > >>>>>>> On Mon, Mar 23, 2020 at 4:40 PM Wes McKinney <wesmck...@gmail.com> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> hi folks, > > >>>>>>>> > > >>>>>>>> Sorry it's taken me a little while to produce supporting > > >>>>>>>> benchmarks. > > >>>>>>>> > > >>>>>>>> * I implemented experimental trivial body buffer compression in > > >>>>>>>> https://github.com/apache/arrow/pull/6638 > > >>>>>>>> * I hooked up the Arrow IPC file format with compression as the new > > >>>>>>>> Feather V2 format in > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476 > > >>>>>>>> > > >>>>>>>> I tested a couple of real-world datasets from a prior blog post > > >>>>>>>> https://ursalabs.org/blog/2019-10-columnar-perf/ with ZSTD and LZ4 > > >>>>>>>> codecs > > >>>>>>>> > > >>>>>>>> The complete results are here > > >>>>>>>> https://github.com/apache/arrow/pull/6694#issuecomment-602906476 > > >>>>>>>> > > >>>>>>>> Summary: > > >>>>>>>> > > >>>>>>>> * Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD > > >>>>>>>> on > > >>>>>>>> the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie > > >>>>>>>> Mae > > >>>>>>>> dataset. So that's a huge space savings > > >>>>>>>> * Single-threaded decompression times exceeding 2-4GByte/s with LZ4 > > >>>>>>>> and 1.2-3GByte/s with ZSTD > > >>>>>>>> > > >>>>>>>> I would have to do some more engineering to test throughput changes > > >>>>>>>> with Flight, but given these results on slower networking (e.g. 1 > > >>>>>>>> Gigabit) my guess is that the compression and decompression > > >>>>>>>> overhead > > >>>>>>>> is little compared with the time savings due to high compression > > >>>>>>>> ratios. If people would like to see these numbers to help make a > > >>>>>>>> decision I can take a closer look > > >>>>>>>> > > >>>>>>>> As far as what Micah said about having a limited number of > > >>>>>>>> compressors: I would be in favor of having just LZ4 and ZSTD. It > > >>>>>>>> seems > > >>>>>>>> anecdotally that these outperform Snappy in most real world > > >>>>>>>> scenarios > > >>>>>>>> and generally have > 1 GB/s decompression performance. Some Linux > > >>>>>>>> distributions (Arch at least) have already started adopting ZSTD > > >>>>>>>> over > > >>>>>>>> LZMA or GZIP [1] > > >>>>>>>> > > >>>>>>>> - Wes > > >>>>>>>> > > >>>>>>>> [1]: > > >>>>>>>> > > >>>>>> https://www.archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/ > > >>>>>>>> > > >>>>>>>> On Fri, Mar 6, 2020 at 8:42 AM Fan Liya <liya.fa...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Hi Wes, > > >>>>>>>>> > > >>>>>>>>> Thanks a lot for the additional information. > > >>>>>>>>> Looking forward to see the good results from your experiments. > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Liya Fan > > >>>>>>>>> > > >>>>>>>>> On Thu, Mar 5, 2020 at 11:42 PM Wes McKinney <wesmck...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> I see, thank you. > > >>>>>>>>>> > > >>>>>>>>>> For such a scenario, implementations would need to define a > > >>>>>>>>>> "UserDefinedCodec" interface to enable codecs to be registered > > >>>>>>>>>> from > > >>>>>>>>>> third party code, similar to what is done for extension types [1] > > >>>>>>>>>> > > >>>>>>>>>> I'll update this thread when I get my experimental C++ patch up > > >>>>>>>>>> to > > >>>>>> see > > >>>>>>>>>> what I'm thinking at least for the built-in codecs we have like > > >>>>>> ZSTD. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types > > >>>>>>>>>> > > >>>>>>>>>> On Thu, Mar 5, 2020 at 7:56 AM Fan Liya <liya.fa...@gmail.com> > > >>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Hi Wes, > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks a lot for your further clarification. > > >>>>>>>>>>> > > >>>>>>>>>>> Some of my prelimiary thoughts: > > >>>>>>>>>>> > > >>>>>>>>>>> 1. We assign a unique GUID to each pair of > > >>>>>> compression/decompression > > >>>>>>>>>>> strategies. The GUID is stored as part of the > > >>>>>>>> Message.custom_metadata. > > >>>>>>>>>> When > > >>>>>>>>>>> receiving the GUID, the receiver knows which decompression > > >>>>>> strategy > > >>>>>>>> to > > >>>>>>>>>> use. > > >>>>>>>>>>> > > >>>>>>>>>>> 2. We serialize the decompression strategy, and store it into > > >>>>>>>>>>> the > > >>>>>>>>>>> Message.custom_metadata. The receiver can decompress data after > > >>>>>>>>>>> deserializing the strategy. > > >>>>>>>>>>> > > >>>>>>>>>>> Method 1 is generally used in static strategy scenarios while > > >>>>>> method > > >>>>>>>> 2 is > > >>>>>>>>>>> generally used in dynamic strategy scenarios. > > >>>>>>>>>>> > > >>>>>>>>>>> Best, > > >>>>>>>>>>> Liya Fan > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney < > > >>>>>> wesmck...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Okay, I guess my question is how the receiver is going to be > > >>>>>> able > > >>>>>>>> to > > >>>>>>>>>>>> determine how to "rehydrate" the record batch buffers: > > >>>>>>>>>>>> > > >>>>>>>>>>>> What I've proposed amounts to the following: > > >>>>>>>>>>>> > > >>>>>>>>>>>> * UNCOMPRESSED: the current behavior > > >>>>>>>>>>>> * ZSTD/LZ4/...: each buffer is compressed and written with an > > >>>>>> int64 > > >>>>>>>>>>>> length prefix > > >>>>>>>>>>>> > > >>>>>>>>>>>> (I'm close to putting up a PR implementing an experimental > > >>>>>> version > > >>>>>>>> of > > >>>>>>>>>>>> this that uses Message.custom_metadata to transmit the codec, > > >>>>>> so > > >>>>>>>> this > > >>>>>>>>>>>> will make the implementation details more concrete) > > >>>>>>>>>>>> > > >>>>>>>>>>>> So in the USER_DEFINED case, how will the library know how to > > >>>>>>>> obtain > > >>>>>>>>>>>> the uncompressed buffer? Is some additional metadata structure > > >>>>>>>>>>>> required to provide instructions? > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Wed, Mar 4, 2020 at 8:05 AM Fan Liya <liya.fa...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Hi Wes, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I am thinking of adding an option named "USER_DEFINED" (or > > >>>>>>>> something > > >>>>>>>>>>>>> similar) to enum CompressionType in your proposal. > > >>>>>>>>>>>>> IMO, this option should be used primarily in Flight. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Best, > > >>>>>>>>>>>>> Liya Fan > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney < > > >>>>>>>> wesmck...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Mar 3, 2020, 8:11 PM Fan Liya < > > >>>>>> liya.fa...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Sure. I agree with you that we should not overdo this. > > >>>>>>>>>>>>>>> I am wondering if we should provide an option to allow > > >>>>>> users > > >>>>>>>> to > > >>>>>>>>>>>> plugin > > >>>>>>>>>>>>>>> their customized compression strategies. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Can you provide a patch showing changes to Message.fbs (or > > >>>>>>>>>> Schema.fbs) > > >>>>>>>>>>>> that > > >>>>>>>>>>>>>> make this idea more concrete? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>> Liya Fan > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney < > > >>>>>>>> wesmck...@gmail.com > > >>>>>>>>>>> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Tue, Mar 3, 2020, 7:36 AM Fan Liya < > > >>>>>>>> liya.fa...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I am so glad to see this discussion, and I am > > >>>>>> willing to > > >>>>>>>>>> provide > > >>>>>>>>>>>> help > > >>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>> the Java side. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> In the proposal, I see the support for basic > > >>>>>> compression > > >>>>>>>>>>>> strategies > > >>>>>>>>>>>>>>>>> (e.g.gzip, snappy). > > >>>>>>>>>>>>>>>>> IMO, applying a single basic strategy is not likely > > >>>>>> to > > >>>>>>>>>> achieve > > >>>>>>>>>>>>>>>> performance > > >>>>>>>>>>>>>>>>> improvement for most scenarios. > > >>>>>>>>>>>>>>>>> The optimal compression strategy is often obtained by > > >>>>>>>>>> composing > > >>>>>>>>>>>> basic > > >>>>>>>>>>>>>>>>> strategies and tuning parameters. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I hope we can support such highly customized > > >>>>>> compression > > >>>>>>>>>>>> strategies. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I think very much beyond trivial one-shot buffer level > > >>>>>>>>>> compression > > >>>>>>>>>>>> is > > >>>>>>>>>>>>>>>> probably out of the question for addition to the > > >>>>>> current > > >>>>>>>>>>>> "RecordBatch" > > >>>>>>>>>>>>>>>> Flatbuffers type, because the additional metadata > > >>>>>> would add > > >>>>>>>>>>>> undesirable > > >>>>>>>>>>>>>>>> bloat (which I would be against). If people have other > > >>>>>>>> ideas it > > >>>>>>>>>>>> would > > >>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>> great to see exactly what you are thinking as far as > > >>>>>>>> changes > > >>>>>>>>>> to the > > >>>>>>>>>>>>>>>> protocol files. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I'll try to assemble some examples to show the > > >>>>>> before/after > > >>>>>>>>>>>> results of > > >>>>>>>>>>>>>>>> applying the simple strategy. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>> Liya Fan > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou < > > >>>>>>>>>>>> anto...@python.org> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> If we want to use a HTTP header, it would be more > > >>>>>> of a > > >>>>>>>>>>>>>>> Accept-Encoding > > >>>>>>>>>>>>>>>>>> header, no? > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> In any case, we would have to put non-standard > > >>>>>> values > > >>>>>>>> there > > >>>>>>>>>>>> (e.g. > > >>>>>>>>>>>>>>> lz4), > > >>>>>>>>>>>>>>>>>> so I'm not sure how desirable it is to repurpose > > >>>>>> HTTP > > >>>>>>>>>> headers > > >>>>>>>>>>>> for > > >>>>>>>>>>>>>>> that, > > >>>>>>>>>>>>>>>>>> rather than add some dedicated field to the Flight > > >>>>>>>>>> messages. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Regards > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Antoine. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Le 03/03/2020 à 12:52, David Li a écrit : > > >>>>>>>>>>>>>>>>>>> gRPC supports headers so for Flight, we could > > >>>>>> send > > >>>>>>>>>>>> essentially an > > >>>>>>>>>>>>>>>>> Accept > > >>>>>>>>>>>>>>>>>>> header and perhaps a Content-Type header. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> David > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Mon, Mar 2, 2020, 23:15 Micah Kornfield < > > >>>>>>>>>>>>>> emkornfi...@gmail.com> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Hi Wes, > > >>>>>>>>>>>>>>>>>>>> A few thoughts on this. In general, I think it > > >>>>>> is a > > >>>>>>>>>> good > > >>>>>>>>>>>> idea. > > >>>>>>>>>>>>>>> But > > >>>>>>>>>>>>>>>>>> before > > >>>>>>>>>>>>>>>>>>>> proceeding, I think the following points are > > >>>>>> worth > > >>>>>>>>>>>> discussing: > > >>>>>>>>>>>>>>>>>>>> 1. Does this actually improve > > >>>>>> throughput/latency > > >>>>>>>> for > > >>>>>>>>>>>> Flight? (I > > >>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>> mentioned you would follow-up with benchmarks). > > >>>>>>>>>>>>>>>>>>>> 2. I think we should limit the number of > > >>>>>> supported > > >>>>>>>>>>>> compression > > >>>>>>>>>>>>>>>>> schemes > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>> only 1 or 2. I think the criteria for selection > > >>>>>>>> speed > > >>>>>>>>>> and > > >>>>>>>>>>>>>> native > > >>>>>>>>>>>>>>>>>>>> implementations available across the widest > > >>>>>> possible > > >>>>>>>>>>>> languages. > > >>>>>>>>>>>>>>> As > > >>>>>>>>>>>>>>>>> far > > >>>>>>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>>>>> i can tell zstd only have bindings in java via > > >>>>>> JNI, > > >>>>>>>> but > > >>>>>>>>>> my > > >>>>>>>>>>>>>>>>>> understanding is > > >>>>>>>>>>>>>>>>>>>> it is probably the type of compression for our > > >>>>>>>>>> use-cases. > > >>>>>>>>>>>> So I > > >>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>> zstd + potentially 1 more. > > >>>>>>>>>>>>>>>>>>>> 3. Commitment from someone on the Java side to > > >>>>>>>>>> implement > > >>>>>>>>>>>> this. > > >>>>>>>>>>>>>>>>>>>> 4. This doesn't need to be coupled with this > > >>>>>> change > > >>>>>>>>>> per-se > > >>>>>>>>>>>> but > > >>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>> something like flight it would be good to have a > > >>>>>>>>>> standard > > >>>>>>>>>>>>>>> mechanism > > >>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>> negotiating server/client capabilities (e.g. > > >>>>>> client > > >>>>>>>>>> doesn't > > >>>>>>>>>>>>>>> support > > >>>>>>>>>>>>>>>>>>>> compression or only supports a subset). > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>>>>>> Micah > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 1:24 PM Wes McKinney < > > >>>>>>>>>>>>>> wesmck...@gmail.com> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Sun, Mar 1, 2020 at 3:14 PM Antoine Pitrou < > > >>>>>>>>>>>>>>> anto...@python.org> > > >>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Le 01/03/2020 à 22:01, Wes McKinney a écrit : > > >>>>>>>>>>>>>>>>>>>>>>> In the context of a "next version of the > > >>>>>> Feather > > >>>>>>>>>> format" > > >>>>>>>>>>>>>>>> ARROW-5510 > > >>>>>>>>>>>>>>>>>>>>>>> (which is consumed only by Python and R at > > >>>>>> the > > >>>>>>>>>> moment), I > > >>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>> been > > >>>>>>>>>>>>>>>>>>>>>>> looking at compressing buffers using fast > > >>>>>>>> compressors > > >>>>>>>>>>>> like > > >>>>>>>>>>>>>> ZSTD > > >>>>>>>>>>>>>>>>> when > > >>>>>>>>>>>>>>>>>>>>>>> writing the RecordBatch bodies. This could be > > >>>>>>>> handled > > >>>>>>>>>>>>>> privately > > >>>>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>> an > > >>>>>>>>>>>>>>>>>>>>>>> implementation detail of the Feather file, > > >>>>>> but > > >>>>>>>> since > > >>>>>>>>>> ZSTD > > >>>>>>>>>>>>>>>>> compression > > >>>>>>>>>>>>>>>>>>>>>>> could improve throughput in Flight, for > > >>>>>> example, > > >>>>>>>> I > > >>>>>>>>>>>> thought I > > >>>>>>>>>>>>>>>> would > > >>>>>>>>>>>>>>>>>>>>>>> bring it up for discussion. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> I can see two simple compression strategies: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> * Compress the entire message body in > > >>>>>> one-shot, > > >>>>>>>>>> writing > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> result > > >>>>>>>>>>>>>>>>>>>> out > > >>>>>>>>>>>>>>>>>>>>>>> with an 8-byte int64 prefix indicating the > > >>>>>>>>>> uncompressed > > >>>>>>>>>>>> size > > >>>>>>>>>>>>>>>>>>>>>>> * Compress each non-zero-length constituent > > >>>>>>>> Buffer > > >>>>>>>>>> prior > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> writing > > >>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>> the body (and using the same > > >>>>>>>>>> uncompressed-length-prefix > > >>>>>>>>>>>> when > > >>>>>>>>>>>>>>>>> writing > > >>>>>>>>>>>>>>>>>>>>>>> the compressed buffer) > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> The latter strategy is preferable for > > >>>>>> scenarios > > >>>>>>>>>> where we > > >>>>>>>>>>>> may > > >>>>>>>>>>>>>>>>> project > > >>>>>>>>>>>>>>>>>>>>>>> out only a few fields from a larger record > > >>>>>> batch > > >>>>>>>>>> (such as > > >>>>>>>>>>>>>>> reading > > >>>>>>>>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>>>>>>> a memory-mapped file). > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Agreed. It may also allow using different > > >>>>>>>> compression > > >>>>>>>>>>>>>>> strategies > > >>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>>> different kinds of buffers (for example a > > >>>>>>>> bytestream > > >>>>>>>>>>>> splitting > > >>>>>>>>>>>>>>>>>> strategy > > >>>>>>>>>>>>>>>>>>>>>> for floats and doubles, or a delta encoding > > >>>>>>>> strategy > > >>>>>>>>>> for > > >>>>>>>>>>>>>>>> integers). > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> If we wanted to allow for different > > >>>>>> compression to > > >>>>>>>>>> apply to > > >>>>>>>>>>>>>>>> different > > >>>>>>>>>>>>>>>>>>>>> buffers, I think we will need a new Message > > >>>>>> type > > >>>>>>>>>> because > > >>>>>>>>>>>> this > > >>>>>>>>>>>>>>> would > > >>>>>>>>>>>>>>>>>>>>> inflate metadata sizes in a way that is not > > >>>>>> likely > > >>>>>>>> to > > >>>>>>>>>> be > > >>>>>>>>>>>>>>> acceptable > > >>>>>>>>>>>>>>>>>>>>> for the current uncompressed use case. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Here is my strawman proposal > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> https://github.com/apache/arrow/compare/master...wesm:compression-strawman > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Implementation could be accomplished by one > > >>>>>> of > > >>>>>>>> the > > >>>>>>>>>>>> following > > >>>>>>>>>>>>>>>>> methods: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> * Setting a field in Message.custom_metadata > > >>>>>>>>>>>>>>>>>>>>>>> * Adding a new field to Message > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> I think it has to be a new field in Message. > > >>>>>>>> Making > > >>>>>>>>>> it an > > >>>>>>>>>>>>>>>> ignorable > > >>>>>>>>>>>>>>>>>>>>>> metadata field means non-supporting receivers > > >>>>>> will > > >>>>>>>>>> decode > > >>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>> interpret > > >>>>>>>>>>>>>>>>>>>>>> the data wrongly. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Regards > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Antoine. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>