alamb commented on code in PR #15050:
URL: https://github.com/apache/datafusion/pull/15050#discussion_r1985767837


##########
datafusion/optimizer/src/decorrelate.rs:
##########
@@ -56,10 +56,14 @@ pub struct PullUpCorrelatedExpr {
     /// Indicates if we encounter any correlated expression that can not be 
pulled up
     /// above a aggregation without changing the meaning of the query.
     can_pull_over_aggregation: bool,
-    /// Do we need to handle [the Count bug] during the pull up process.
-    /// TODO this parameter should be removed or renamed semantically
+    /// Do we need to handle the [count bug] during the pull up process.
     ///
-    /// [the Count bug]: https://github.com/apache/datafusion/issues/10553
+    /// The "count bug" was described in [Optimization of Nested SQL
+    /// Queries Revisited](https://dl.acm.org/doi/pdf/10.1145/38714.38723). 
This bug is
+    /// not specific to the COUNT function, and it can occur with any 
aggregate function,
+    /// such as SUM, AVG, etc. The anomaly arises because aggregates fail to 
distinguish
+    /// between an empty set and null values when optimizing a correlated 
query as a join.
+    /// Here, we use "the count bug" to refer to all such cases.

Review Comment:
   Thank you @suibianwanwank  -- this is great! I learned something new!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to