alamb opened a new pull request, #548:
URL: https://github.com/apache/parquet-format/pull/548

   This is a proposed implementation of the ALP encodings
   
   It is based on (largely a reformatted version of) @prtkgaur 's [ALP Encoding 
Specification Google 
Doc](https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit)
   
   ### Rationale for this change
   
   This encoding has the following properties:
   * Targets real-world floating-point (IEEE 754) data. 
   * It achieves higher compression ratios (close to ZSTD)
   * Much faster to decompress than zstd (and other floating point algorithms)
   * Supports random row access (can decode individual rows without 
decompressing an entire page)
   
   See Mailing List Discussion: 
https://lists.apache.org/thread/tjtln1mmjqfoql1ls2dw9xpdk91r1909
   
   
   
   <img width="696" height="468" alt="Screenshot 2026-01-14 at 2 45 35 PM" 
src="https://github.com/user-attachments/assets/756eb156-f0d6-4ef1-90d4-04d71a3b11f0";
 />
   
   Source [ALP Results 
Document](https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0#heading=h.5xf60mx6q7xk)
   
   (Todo summarize the mailing list discussion here)
   
   
   ### What changes are included in this PR?
   - Closes https://github.com/apache/parquet-format/issues/533
   
   ### Do these changes have PoC implementations?
   Yes
   - [ ] C/C++: https://github.com/apache/arrow/pull/48345
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to