asolimando commented on issue #18628:
URL: https://github.com/apache/datafusion/issues/18628#issuecomment-3526607780

   NDV is also generally used for aggregation pushdown, as it helps 
understanding how the the number of groups relates with the number of tuples, 
ranging from the case where you group over a primary key (`#tuple = #groups`) 
to grouping over a single valued column (`#groups = 1`).
   
   Roughly `#groups(c) = NDV(c)`, where `c` is a column (some form of 
interpolation is needed for multi-columns group-by, starting from individual 
NDVs for the given columns).
   
   https://github.com/apache/datafusion/pull/11627 introduced a runtime 
optimization to this effect, skipping the partial aggregation when it doesn't 
see enough reduction, as you would end up doing the same work twice.
   
   It would indeed be nice to have the planner make use of NDV to improve the 
plan.
   
   (this seems more an enhancement than a bug, as it's a missed opportunity but 
it doesn't affect correctness)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to