Thanks Adrian.

Yes, that is absolutely correct. Having the power of doing filter push
downs will really really help ALP ( and few other schemes ) over the block
compression schemes like ZSTD. That is an added plus of ALP over ZSTD in
addition to better decompression speed.
And I agree in most cases decompression speed is given more weight than
compression speeds.

On Thu, Oct 16, 2025 at 5:53 PM Adrian Garcia Badaracco
<[email protected]> wrote:

> Thank you for sharing that. Very interesting. I do think decompression
> speed is generally more important than compression speed. Another thing to
> consider is the possibility of operating on the compressed data e.g. for
> filtering: zstd data for example has to be decompressed before any
> filtering, arithmetic, etc. can be done. I believe at least filtering could
> be done on some of these other encodings. Apologies if this was discussed
> in the meeting already.
>
> > On Oct 16, 2025, at 4:47 PM, PRATEEK GAUR <[email protected]> wrote:
> >
> > Hi team,
> >
> > We spent some time evaluating ALP compression and decompression compared
> to
> > other encoding alternatives like CHIMP/GORILLA and compression techniques
> > like SNAPPY/LZ4/ZSTD. We presented these numbers to the community members
> > on October 15th in the biweekly parquet meeting. ( I can't seem to access
> > the recording, so please let me know what access rules I need to get to
> be
> > able to view it )
> >
> > We did this evaluation over some datasets pointed by the ALP paper and
> some
> > pointed by the parquet community.
> >
> > The results are available in the following document
> > <
> https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0
> >
> > :
> >
> https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg
> >
> > Based on the numbers we see
> >
> >   -  ALP is comparable to ZSTD(level=1) in terms of compression ratio and
> >   much better compared to other schemes. (numbers in the sheet are bytes
> >   needed to encode each value )
> >   - ALP going quite well in terms of decompression speed (numbers in the
> >   sheet are bytes decompressed per second)
> >
> > As next steps we will
> >
> >   - Get the numbers for compression on top of byte stream split.
> >   - Evaluate the algorithm over a few more datasets.
> >   - Have an implementation in the arrow-parquet repo.
> >
> > Looking forward to feedback from the community.
> >
> > Best
> > Prateek and Dhirhan
>
>

Reply via email to