Thank you for sharing that. Very interesting. I do think decompression speed is 
generally more important than compression speed. Another thing to consider is 
the possibility of operating on the compressed data e.g. for filtering: zstd 
data for example has to be decompressed before any filtering, arithmetic, etc. 
can be done. I believe at least filtering could be done on some of these other 
encodings. Apologies if this was discussed in the meeting already.

> On Oct 16, 2025, at 4:47 PM, PRATEEK GAUR <[email protected]> wrote:
> 
> Hi team,
> 
> We spent some time evaluating ALP compression and decompression compared to
> other encoding alternatives like CHIMP/GORILLA and compression techniques
> like SNAPPY/LZ4/ZSTD. We presented these numbers to the community members
> on October 15th in the biweekly parquet meeting. ( I can't seem to access
> the recording, so please let me know what access rules I need to get to be
> able to view it )
> 
> We did this evaluation over some datasets pointed by the ALP paper and some
> pointed by the parquet community.
> 
> The results are available in the following document
> <https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0>
> :
> https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg
> 
> Based on the numbers we see
> 
>   -  ALP is comparable to ZSTD(level=1) in terms of compression ratio and
>   much better compared to other schemes. (numbers in the sheet are bytes
>   needed to encode each value )
>   - ALP going quite well in terms of decompression speed (numbers in the
>   sheet are bytes decompressed per second)
> 
> As next steps we will
> 
>   - Get the numbers for compression on top of byte stream split.
>   - Evaluate the algorithm over a few more datasets.
>   - Have an implementation in the arrow-parquet repo.
> 
> Looking forward to feedback from the community.
> 
> Best
> Prateek and Dhirhan

Reply via email to