emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-628384930
@wesm OK, I did a little bit more in depth sampling. And it looks like this new algorithm is a win for 0-5% nulls, then a regression until someplace between 45-50% nulls then a likely a win with a larger percentage of nulls. I'll add a special case to estimate which algorithm to use (this one or 1 by 1 based on percentage of nulls and sampling the first N elements of the bitmap vector).  ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
