2010YOUY01 commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848428399
Thank you. This is a foundation for many late materialization optimizations 👏🏼 I have some high-level suggestions, looking forward to hearing your thoughts. 1. Use the term `(selection) bitmap` instead of `selection vector` to avoid confusion. I believe `selection vector` commonly refers (in several recent papers) to vectors of valid indices like `[1, 3, 9, ...]`. See https://db.cs.cmu.edu/papers/2021/ngom-damon2021.pdf 2. Perhaps after this PR—and before implementing any optimization using the selection bitmap—we should first extend `ExecutionPlan` to include related properties like `handles_filtered_input()` and `output_filtered_batches()`, to indicate whether an operator can process batches with metadata filter columns (bitmap/selection vector). We should also add an optimizer pass to validate that if an operator outputs filtered batches, its downstream operator must be able to handle them. I believe similar optimizations apply to most cardinality-reducing operators, so it’s better to be cautious from the start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org