xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002503571


##########
datafusion/expr-common/src/statistics.rs:
##########
@@ -203,6 +203,121 @@ impl Distribution {
         };
         Ok(dt)
     }
+
+    /// Merges two distributions into a single distribution that represents 
their combined statistics.
+    /// This creates a more general distribution that approximates the mixture 
of the input distributions.
+    pub fn merge(&self, other: &Self) -> Result<Self> {
+        let range_a = self.range()?;
+        let range_b = other.range()?;
+
+        // Determine data type and create combined range
+        let combined_range = range_a.union(&range_b)?;
+
+        // Calculate weights for the mixture distribution
+        let (weight_a, weight_b) = match (range_a.cardinality(), 
range_b.cardinality()) {
+            (Some(ca), Some(cb)) => {
+                let total = (ca + cb) as f64;
+                ((ca as f64) / total, (cb as f64) / total)

Review Comment:
   Yeah, the way of computing weight is discussed in the thread: 
https://github.com/apache/datafusion/pull/15296#discussion_r2002041862. And I 
propose a new way that considers the safeguard.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to