2010YOUY01 commented on issue #21120:
URL: https://github.com/apache/datafusion/issues/21120#issuecomment-4131266771

   > This proposal is, however, a little different from the other efforts 
tracked by existing epics like 
[#8227](https://github.com/apache/datafusion/issues/8227) and 
[#20766](https://github.com/apache/datafusion/issues/20766), it aims at 
introducing tooling to enable override mechanism for downstream projects (with 
just a reasonable default). This links to 
[@paleolimbot](https://github.com/paleolimbot)'s 
[interest](https://github.com/apache/datafusion/discussions/21017#discussioncomment-16228504)
 (stats propagation for a specific type of stats) and your work on 
[#19609](https://github.com/apache/datafusion/pull/19609): since people use 
statistics in a very different way, we need to provide an override mechanisms.
   > 
   > To make things a little more concrete, I could take a few examples from 
our existing benchmarks (probably TPC-DS), and showcase how planning could be 
improved with better statistics and propagation, like a motivating example.
   > 
   > WDYT?
   
   This makes sense, I think now at the API framework stage, the goal is clear. 
I was thinking a little bit ahead, when adding expression/operator coverage for 
advanced statistics like NDV, it should better be workload guided at that time.
   
   Also it looks like the API for 
https://github.com/apache/datafusion/pull/21122 and 
[#19609](https://github.com/apache/datafusion/pull/19609) can be unified, I’ll 
try to help with the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to