[GitHub] [arrow-datafusion] alamb commented on issue #1972: DataFusion Optimizer framework discussion

GitBox Wed, 15 Jun 2022 03:41:19 -0700


alamb commented on issue #1972:
URL: 
https://github.com/apache/arrow-datafusion/issues/1972#issuecomment-1156308944


   > My opinion for DataFusion's optimizer framework is that we should continue 
focus on the heuristic planner approach in current phase, implement an 
optimizer framework like SparkSQL's catalyst optimizer, make it relatively easy 
to add new rules. In future, we can go with the adaptive execution approach.
   
   Thank you @mingmwang  for the writeup. I would second your assertion that 
almost all successful real world (e.g. commerical) query optimizers are not 
implemented with a cascades like framework, but instead are some combination of 
heuristics and cost models.
   
   I also think the point that cost models have unsolved error propagation 
issues -- my experience was that after about 2-3 joins, the output cardinality 
estimation is basically a guess, even with advanced statistics like histograms.
   
   What I would like to see in DataFusion is:
   1. A solid "classic" heuristic optimizer as a default
   2. Sufficient extension points that anyone who wants to experiment / create 
/ use a different optimizer strategy can easily do so.
   
   In my mind this is like `LLVM` -- provides "state of the art" foundation and 
then users can customize as they need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #1972: DataFusion Optimizer framework discussion

Reply via email to