milesgranger opened a new pull request, #39368:
URL: https://github.com/apache/arrow/pull/39368

   Greetings and happy holidays.
   
   I fully expect this to be closed and not taken further. I'm parking it here 
in case anyone expresses interest now or later.
   
   This was an experiment to swap out all compression libraries with 
[libcramjam](https://anaconda.org/conda-forge/libcramjam) which has been used 
for some years now in the Python compression library 
[cramjam](https://github.com/milesgranger/cramjam/tree/master/cramjam-python) 
with ~5M downloads a month without any significant issues reported.
   
   Mostly because I wanted to [develop a C API for 
libcramjam](https://github.com/milesgranger/cramjam/pull/119) and thought arrow 
was a great project to develop this against. Turns out it was, and it passes 
all but 6 pyarrow tests, 2 of which are related to enabling one-shot for Bzip2 
which wasn't previously available (could also add snappy frame de/compressors 
pretty easily I think). Other failures almost surely due to my general 
ineptitude in C++ during refactoring...
   
   To that point, the build is garbage, I didn't bother to remove all the cmake 
related logic to including those libs, only removing their headers from 
compression_x.cc files and including cramjam with a global CXX flag, and 
hacking together the other build scripts. :sweat_smile: 
   
   Some rough benchmarks appearing to be on par for performance, with the 
biggest being ~10% improvement to snappy raw format when reading a test 128MiB 
parquet file.
   
   A few linux test wheels available here: 
https://pypi.org/project/pyarrow-cramjam/ (`pip install pyarrow-cramjam`)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to