goldmedal commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848568844

   Thanks @2010YOUY01  for the suggestions.
   
   > 1. Use the term `(selection) bitmap` instead of `selection vector` to 
avoid confusion. I believe `selection vector` commonly refers (in several 
recent papers) to vectors of valid indices like `[1, 3, 9, ...]`. See 
https://db.cs.cmu.edu/papers/2021/ngom-damon2021.pdf
   
   Agreed. The `selection bitmap` is a better name for the field. We can use 
`array.as_boolean().values().set_indices()` to get the true selection vector 
from a column. I did it in my implementation for hash-aggregation 
https://github.com/goldmedal/datafusion/pull/4#discussion_r2051711975
   
   > 2. Perhaps after this PR—and before implementing any optimization using 
the selection bitmap—we should first extend `ExecutionPlan` to include related 
properties like `handles_filtered_input()` and `output_filtered_batches()`, to 
indicate whether an operator can process batches with metadata filter columns 
(bitmap/selection vector). We should also add an optimizer pass to validate 
that if an operator outputs filtered batches, its downstream operator must be 
able to handle them.
   > 
   
   Providing the API to check the selection bitmap makes sense to me 👍. I'll 
consider how to do it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to