hi Amol,

thanks for pointing that out. Such a heuristic (observing compression
ratios of stream messages) could be implemented at some point so that
compression could be toggled off mid-stream if it doesn't seem to be
helping. Feel free to open a JIRA issue about this

- Wes

On Sat, May 16, 2020 at 6:39 AM Amol Umbarkar <amolumbar...@gmail.com> wrote:
>
> Hello All,
> I was going through dask developer log recently. Dask seems to be
> selectively do compression if it is found to be useful. They sort of pick
> 10kb of sample upfront to calculate compression and if the results are good
> then the whole batch is compressed. This seems to save de-compression
> effort on receiver side.
>
> Please take a look at
> https://blog.dask.org/2016/04/14/dask-distributed-optimizing-protocol#problem-3-unwanted-compression
>
> Thought this could be relevant to arrow batch transfers as well.
>
> Thanks,
> Amol
>
> On Thu, Apr 23, 2020 at 5:54 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > Hello,
> >
> > I have proposed adding a simple RecordBatch IPC message body
> > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC
> > protocol in GitHub PR [1] as discussed on the mailing list [2]. This
> > is distinct from separate discussions about adding in-memory encodings
> > (like RLE-encoding) to the Arrow columnar format.
> >
> > This change is not forward compatible so it will not be safe to send
> > compressed messages to old libraries, but since we are still pre-1.0.0
> > the consensus is that this is acceptable. We may separately consider
> > increasing the metadata version for 1.0.0 to require clients to
> > upgrade.
> >
> > Please vote whether to accept the addition. The vote will be open for
> > at least 72 hours.
> >
> > [ ] +1 Accept this addition to the IPC protocol
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Here is my vote: +1
> >
> > Thanks,
> > Wes
> >
> > [1]: https://github.com/apache/arrow/pull/6707
> > [2]:
> > https://lists.apache.org/thread.html/r58c9d23ad159644fca590d8f841df80d180b11bfb72f949d601d764b%40%3Cdev.arrow.apache.org%3E
> >

Reply via email to