Hello All, I was going through dask developer log recently. Dask seems to be selectively do compression if it is found to be useful. They sort of pick 10kb of sample upfront to calculate compression and if the results are good then the whole batch is compressed. This seems to save de-compression effort on receiver side.
Please take a look at https://blog.dask.org/2016/04/14/dask-distributed-optimizing-protocol#problem-3-unwanted-compression Thought this could be relevant to arrow batch transfers as well. Thanks, Amol On Thu, Apr 23, 2020 at 5:54 AM Wes McKinney <wesmck...@gmail.com> wrote: > Hello, > > I have proposed adding a simple RecordBatch IPC message body > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC > protocol in GitHub PR [1] as discussed on the mailing list [2]. This > is distinct from separate discussions about adding in-memory encodings > (like RLE-encoding) to the Arrow columnar format. > > This change is not forward compatible so it will not be safe to send > compressed messages to old libraries, but since we are still pre-1.0.0 > the consensus is that this is acceptable. We may separately consider > increasing the metadata version for 1.0.0 to require clients to > upgrade. > > Please vote whether to accept the addition. The vote will be open for > at least 72 hours. > > [ ] +1 Accept this addition to the IPC protocol > [ ] +0 > [ ] -1 Do not accept the changes because... > > Here is my vote: +1 > > Thanks, > Wes > > [1]: https://github.com/apache/arrow/pull/6707 > [2]: > https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E >