iChauster commented on PR #13366: URL: https://github.com/apache/arrow/pull/13366#issuecomment-1163390116
> > For a filtering operation I think there is an extra parameter which is the selectivity (what percentage of rows are kept). I think it would be valuable to add that as a parameter but it would make test data generation more complicated. > > @westonpace one idea I had to benchmark selectivity is perhaps using the existing 'null_percent/proportion' generators we already have, and then using `is_null` as the filter. Let me know if you think that would be the right approach. After thinking some more about it, maybe we should use some numerical array with uniform distribution, which will allow us to test multiple filter passes. I think with the other approach, we only can get two (filter out nulls, then filter out by true / false). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org