asolimando commented on code in PR #20846:
URL: https://github.com/apache/datafusion/pull/20846#discussion_r2913353871


##########
datafusion/physical-plan/src/union.rs:
##########
@@ -854,7 +853,7 @@ fn col_stats_union(
     mut left: ColumnStatistics,
     right: &ColumnStatistics,
 ) -> ColumnStatistics {
-    left.distinct_count = Precision::Absent;
+    left.distinct_count = 
left.distinct_count.add(&right.distinct_count).to_inexact();

Review Comment:
   @jonathanc-n, when min/max are available we can indeed do better by using 
that formula for updating NDV, `max` being just the degenerate case where you 
assume 100% overlap across merged relations (be it partitions or union'ed 
relations).
   
   This would directly address @xudong963's 
[concern](https://github.com/apache/datafusion/pull/19957#discussion_r2897501498)
 about disjoint domains: min/max ranges would show near-zero overlap, so the 
formula naturally approaches `sum` instead of `max`.
   
   The formula assumes uniform distribution within the min/max range, but 
that's a classic assumption when working with scalar statistics, and with just 
min/max/NDV we can't easily do better. (Richer stats like in 
[StatisticsV2](https://github.com/apache/datafusion/pull/14699) could help, but 
let's reason within the current statistics framing for now).
   
   I will add a comment in https://github.com/apache/datafusion/pull/19957 to 
make sure we capture this discussion, thanks for the ping.
   
   EDIT: added 
[here](https://github.com/apache/datafusion/pull/19957#discussion_r2913413380)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to