alamb opened a new pull request, #49154:
URL: https://github.com/apache/arrow/pull/49154

   This builds on the following PR from @prtkgaur
   - https://github.com/apache/arrow/pull/48345
   
   It contains a binary that creates files using the new ALP encoding here:
   - https://github.com/apache/parquet-format/pull/548
   
   I don't intend to merge this PR, rather I plan to use it to create test 
parquet files, and am posting the PR in case anyone else is interested.
   
   To build
   ```shell
     cd arrow/cpp
     cmake -S . -B build -DARROW_PARQUET=ON -DPARQUET_BUILD_EXAMPLES=ON \
       -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
       -DARROW_MIMALLOC=OFF -DARROW_SIMD_LEVEL=NONE 
-DARROW_RUNTIME_SIMD_LEVEL=NONE
     MAKEFLAGS=-j8 cmake --build build --target parquet-write-parquet
   ```
   
   To run
   ```shell
   cd arrow/cpp
   ./build/release/parquet-write-parquet --encoding ALP  /tmp
   ```
   
   This writes a file like this to /tmp:  
[single_f64_ALP.zip](https://github.com/user-attachments/files/25097841/single_f64_ALP.zip)
   
   TODO: make sure the following patterns,[ from the 
spec](https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit),
 are covered: 
   1. pages with no exceptions
   2. encoding w/ exceptiosn and NAN, INF, etc
   3. multiple ALP vector sizes (1 -> 15 == 65k)
   4. Both f32 and f64 variants
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to