Thanks Adrian. Yes, that is absolutely correct. Having the power of doing filter push downs will really really help ALP ( and few other schemes ) over the block compression schemes like ZSTD. That is an added plus of ALP over ZSTD in addition to better decompression speed. And I agree in most cases decompression speed is given more weight than compression speeds.
On Thu, Oct 16, 2025 at 5:53 PM Adrian Garcia Badaracco <[email protected]> wrote: > Thank you for sharing that. Very interesting. I do think decompression > speed is generally more important than compression speed. Another thing to > consider is the possibility of operating on the compressed data e.g. for > filtering: zstd data for example has to be decompressed before any > filtering, arithmetic, etc. can be done. I believe at least filtering could > be done on some of these other encodings. Apologies if this was discussed > in the meeting already. > > > On Oct 16, 2025, at 4:47 PM, PRATEEK GAUR <[email protected]> wrote: > > > > Hi team, > > > > We spent some time evaluating ALP compression and decompression compared > to > > other encoding alternatives like CHIMP/GORILLA and compression techniques > > like SNAPPY/LZ4/ZSTD. We presented these numbers to the community members > > on October 15th in the biweekly parquet meeting. ( I can't seem to access > > the recording, so please let me know what access rules I need to get to > be > > able to view it ) > > > > We did this evaluation over some datasets pointed by the ALP paper and > some > > pointed by the parquet community. > > > > The results are available in the following document > > < > https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0 > > > > : > > > https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg > > > > Based on the numbers we see > > > > - ALP is comparable to ZSTD(level=1) in terms of compression ratio and > > much better compared to other schemes. (numbers in the sheet are bytes > > needed to encode each value ) > > - ALP going quite well in terms of decompression speed (numbers in the > > sheet are bytes decompressed per second) > > > > As next steps we will > > > > - Get the numbers for compression on top of byte stream split. > > - Evaluate the algorithm over a few more datasets. > > - Have an implementation in the arrow-parquet repo. > > > > Looking forward to feedback from the community. > > > > Best > > Prateek and Dhirhan > >
