milesgranger opened a new pull request, #39368: URL: https://github.com/apache/arrow/pull/39368
Greetings and happy holidays. I fully expect this to be closed and not taken further. I'm parking it here in case anyone expresses interest now or later. This was an experiment to swap out all compression libraries with [libcramjam](https://anaconda.org/conda-forge/libcramjam) which has been used for some years now in the Python compression library [cramjam](https://github.com/milesgranger/cramjam/tree/master/cramjam-python) with ~5M downloads a month without any significant issues reported. Mostly because I wanted to [develop a C API for libcramjam](https://github.com/milesgranger/cramjam/pull/119) and thought arrow was a great project to develop this against. Turns out it was, and it passes all but 6 pyarrow tests, 2 of which are related to enabling one-shot for Bzip2 which wasn't previously available (could also add snappy frame de/compressors pretty easily I think). Other failures almost surely due to my general ineptitude in C++ during refactoring... To that point, the build is garbage, I didn't bother to remove all the cmake related logic to including those libs, only removing their headers from compression_x.cc files and including cramjam with a global CXX flag, and hacking together the other build scripts. :sweat_smile: Some rough benchmarks appearing to be on par for performance, with the biggest being ~10% improvement to snappy raw format when reading a test 128MiB parquet file. A few linux test wheels available here: https://pypi.org/project/pyarrow-cramjam/ (`pip install pyarrow-cramjam`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
