emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-639219042
> I could try to adapt the Parquet code to use BitBlockCounter and see what the benchmarks look like? @wesm I think if you can take a look and potentially revise the benchmarks at https://github.com/apache/arrow/blob/7ad49eeca5215d9b2a56b6439f1bd6ea38888ea9/cpp/src/parquet/arrow/reader_writer_benchmark.cc#L238 to make sure we are aligned on what we are trying to improve, I can update the this PR accordingly. I think there are really two options: 1. Remove BitRunReader entirely and use BitBlockCounter 2. Use BitBlockCounter in addition to BitRunReader The way to go really depends on what percentage of values we expect to be null. My intuition is that very high rates and very low rates are likely, but I think you probably have a better intuition as to the exact definition of high or low. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
