emkornfield commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-794124675
@HedgehogCode thank you for the benchmark numbers. I think in order to avoid churn in this PR we should keep the existing working commons code path. The data you provided raises the urgency of finding a more performant solution. The issue with the previous library is it does not appear to address [dependent blocks](http://mail-archives.apache.org/mod_mbox/arrow-dev/202101.mbox/%3CCAJPUwMAPSTrdbu4vw=gjily9ciju0fvh_hdon5uaiwe2tk-...@mail.gmail.com%3E) which are emitted from C++. So there are three options: 1. Change the specification to require dependent blocks be disabled. 2. Find a performant library that supports dependent blocks. 3. Ask/provide a patch to the lz4-java library to support dependent blocks. I think especially after ARROW-11899 is done, it should hopefully be fairly easy to try out different implementations (or use your own if interop with other languages isn't a requirement). If you are interested in contributing in this area, maybe coordinate with @liyafan82 to help out once the PR is merged? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
