isidentical opened a new issue, #3929: URL: https://github.com/apache/arrow-datafusion/issues/3929
This is a meta issue for improving cost calculations and cost-based optimizations in DataFusion. We already have some statistics collected (mainly from the table sources) and there are estimations for statistics by some of the execution plan nodes, and the overall idea is to improve these as well as possible CBOs. ### Main Goals - Have enough statistics to start nested join optimizations (#3843). This involves being able to guess the weight of a join side, and do global re-ordering between join sides to minimize the overall cost of parent joins by reducing the output as much as possible at the bottom levels. - Provide a more reliable static analysis phase for physical execution operators (so that range based pruning/predicate pruning can leverage the existing infrastructure on their implementations) - What else? ### Work in Progress - [ ] https://github.com/apache/arrow-datafusion/issues/3898 - [ ] https://github.com/apache/arrow-datafusion/issues/3845 - What else? ### Planned - [ ] Estimating join cardinalities when the underlying table does not have any statistics (https://github.com/apache/arrow-datafusion/issues/3813#issuecomment-1276643214). - What else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
