iChauster commented on PR #13366:
URL: https://github.com/apache/arrow/pull/13366#issuecomment-1163390116

   > > For a filtering operation I think there is an extra parameter which is 
the selectivity (what percentage of rows are kept). I think it would be 
valuable to add that as a parameter but it would make test data generation more 
complicated.
   > 
   > @westonpace one idea I had to benchmark selectivity is perhaps using the 
existing 'null_percent/proportion' generators we already have, and then using 
`is_null` as the filter. Let me know if you think that would be the right 
approach.
   
   After thinking some more about it, maybe we should use some numerical array 
with uniform distribution, which will allow us to test multiple filter passes. 
I think with the other approach, we only can get two (filter out nulls, then 
filter out by true / false).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to