2010YOUY01 commented on PR #15423:
URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848428399

   Thank you. This is a foundation for many late materialization optimizations 
👏🏼 I have some high-level suggestions, looking forward to hearing your thoughts.
   
   1. Use the term `(selection) bitmap` instead of `selection vector` to avoid 
confusion. I believe `selection vector` commonly refers (in several recent 
papers) to vectors of valid indices like `[1, 3, 9, ...]`. See 
https://db.cs.cmu.edu/papers/2021/ngom-damon2021.pdf
   
   2. Perhaps after this PR—and before implementing any optimization using the 
selection bitmap—we should first extend `ExecutionPlan` to include related 
properties like `handles_filtered_input()` and `output_filtered_batches()`, to 
indicate whether an operator can process batches with metadata filter columns 
(bitmap/selection vector). We should also add an optimizer pass to validate 
that if an operator outputs filtered batches, its downstream operator must be 
able to handle them.
   
   I believe similar optimizations apply to most cardinality-reducing 
operators, so it’s better to be cautious from the start.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to