alamb opened a new pull request, #548: URL: https://github.com/apache/parquet-format/pull/548
This is a proposed implementation of the ALP encodings It is based on (largely a reformatted version of) @prtkgaur 's [ALP Encoding Specification Google Doc](https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit) ### Rationale for this change This encoding has the following properties: * Targets real-world floating-point (IEEE 754) data. * It achieves higher compression ratios (close to ZSTD) * Much faster to decompress than zstd (and other floating point algorithms) * Supports random row access (can decode individual rows without decompressing an entire page) See Mailing List Discussion: https://lists.apache.org/thread/tjtln1mmjqfoql1ls2dw9xpdk91r1909 <img width="696" height="468" alt="Screenshot 2026-01-14 at 2 45 35 PM" src="https://github.com/user-attachments/assets/756eb156-f0d6-4ef1-90d4-04d71a3b11f0" /> Source [ALP Results Document](https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0#heading=h.5xf60mx6q7xk) (Todo summarize the mailing list discussion here) ### What changes are included in this PR? - Closes https://github.com/apache/parquet-format/issues/533 ### Do these changes have PoC implementations? Yes - [ ] C/C++: https://github.com/apache/arrow/pull/48345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
