clflushopt commented on code in PR #14735: URL: https://github.com/apache/datafusion/pull/14735#discussion_r1976136920
########## docs/source/library-user-guide/query-optimizer.md: ########## @@ -388,3 +388,119 @@ In the following example, the `type_coercion` and `simplify_expressions` passes ``` [df]: https://crates.io/crates/datafusion + +## Thinking about Query Optimization + +Query optimization in DataFusion uses a cost based model. The cost based model +relies on table and column level statistics to estimate selectivity; selectivity +estimates are an important piece in cost analysis for filters and projections +as they allow estimating the cost of joins and filters. + +An important piece of building these estimates is _boundary analysis_ which uses +interval arithmetic to take an expression such as `a > 2500 AND a <= 5000` and +build an accurate selectivity estimate that can then be used to find more efficient +plans. + +#### `AnalysisContext` API + +The `AnalysisContext` serves as a shared knowledge base during expression evaluation Review Comment: 👍 will do ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org