xudong963 commented on PR #22343:
URL: https://github.com/apache/datafusion/pull/22343#issuecomment-4609938359

   Haven't reviewed the PR, but one related implementation in my mind is 
DuckDB's predicate reordering.
   
   DuckDB implements this as a dedicated `REORDER_FILTER` optimizer pass:
   
https://github.com/duckdb/duckdb/blob/9ddf2e47203038eadab10735d0b5f301893c724e/src/optimizer/optimizer.cpp#L395-L399
   
   The optimizer visits logical filters and nested conjunctions:
   
https://github.com/duckdb/duckdb/blob/9ddf2e47203038eadab10735d0b5f301893c724e/src/optimizer/expression_heuristics.cpp#L17-L33
   
   A notable safety choice is that DuckDB skips reordering entirely if any 
predicate can throw:
   
https://github.com/duckdb/duckdb/blob/9ddf2e47203038eadab10735d0b5f301893c724e/src/optimizer/expression_heuristics.cpp#L49-L53
   
   It also uses a numeric heuristic cost model
   
https://github.com/duckdb/duckdb/blob/9ddf2e47203038eadab10735d0b5f301893c724e/src/optimizer/expression_heuristics.cpp#L89-L167
   
   DuckDB then goes one step further at **execution time**: filter order can be 
adaptively adjusted based on observed runtime( This is cool):
   
https://github.com/duckdb/duckdb/blob/9ddf2e47203038eadab10735d0b5f301893c724e/src/execution/adaptive_filter.cpp#L109-L166
   
   So the DuckDB design has three useful ideas:
   1. keep predicate reordering as a separate optimizer concern;
   2. avoid reordering fallible predicates;
   3. use static heuristic ordering as an initial order, then let runtime 
feedback improve it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to