2010YOUY01 commented on issue #21120: URL: https://github.com/apache/datafusion/issues/21120#issuecomment-4131266771
> This proposal is, however, a little different from the other efforts tracked by existing epics like [#8227](https://github.com/apache/datafusion/issues/8227) and [#20766](https://github.com/apache/datafusion/issues/20766), it aims at introducing tooling to enable override mechanism for downstream projects (with just a reasonable default). This links to [@paleolimbot](https://github.com/paleolimbot)'s [interest](https://github.com/apache/datafusion/discussions/21017#discussioncomment-16228504) (stats propagation for a specific type of stats) and your work on [#19609](https://github.com/apache/datafusion/pull/19609): since people use statistics in a very different way, we need to provide an override mechanisms. > > To make things a little more concrete, I could take a few examples from our existing benchmarks (probably TPC-DS), and showcase how planning could be improved with better statistics and propagation, like a motivating example. > > WDYT? This makes sense, I think now at the API framework stage, the goal is clear. I was thinking a little bit ahead, when adding expression/operator coverage for advanced statistics like NDV, it should better be workload guided at that time. Also it looks like the API for https://github.com/apache/datafusion/pull/21122 and [#19609](https://github.com/apache/datafusion/pull/19609) can be unified, I’ll try to help with the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
