hi Amol, thanks for pointing that out. Such a heuristic (observing compression ratios of stream messages) could be implemented at some point so that compression could be toggled off mid-stream if it doesn't seem to be helping. Feel free to open a JIRA issue about this
- Wes On Sat, May 16, 2020 at 6:39 AM Amol Umbarkar <amolumbar...@gmail.com> wrote: > > Hello All, > I was going through dask developer log recently. Dask seems to be > selectively do compression if it is found to be useful. They sort of pick > 10kb of sample upfront to calculate compression and if the results are good > then the whole batch is compressed. This seems to save de-compression > effort on receiver side. > > Please take a look at > https://blog.dask.org/2016/04/14/dask-distributed-optimizing-protocol#problem-3-unwanted-compression > > Thought this could be relevant to arrow batch transfers as well. > > Thanks, > Amol > > On Thu, Apr 23, 2020 at 5:54 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > Hello, > > > > I have proposed adding a simple RecordBatch IPC message body > > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC > > protocol in GitHub PR [1] as discussed on the mailing list [2]. This > > is distinct from separate discussions about adding in-memory encodings > > (like RLE-encoding) to the Arrow columnar format. > > > > This change is not forward compatible so it will not be safe to send > > compressed messages to old libraries, but since we are still pre-1.0.0 > > the consensus is that this is acceptable. We may separately consider > > increasing the metadata version for 1.0.0 to require clients to > > upgrade. > > > > Please vote whether to accept the addition. The vote will be open for > > at least 72 hours. > > > > [ ] +1 Accept this addition to the IPC protocol > > [ ] +0 > > [ ] -1 Do not accept the changes because... > > > > Here is my vote: +1 > > > > Thanks, > > Wes > > > > [1]: https://github.com/apache/arrow/pull/6707 > > [2]: > > https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E > >