alamb commented on code in PR #15050: URL: https://github.com/apache/datafusion/pull/15050#discussion_r1985767837
########## datafusion/optimizer/src/decorrelate.rs: ########## @@ -56,10 +56,14 @@ pub struct PullUpCorrelatedExpr { /// Indicates if we encounter any correlated expression that can not be pulled up /// above a aggregation without changing the meaning of the query. can_pull_over_aggregation: bool, - /// Do we need to handle [the Count bug] during the pull up process. - /// TODO this parameter should be removed or renamed semantically + /// Do we need to handle the [count bug] during the pull up process. /// - /// [the Count bug]: https://github.com/apache/datafusion/issues/10553 + /// The "count bug" was described in [Optimization of Nested SQL + /// Queries Revisited](https://dl.acm.org/doi/pdf/10.1145/38714.38723). This bug is + /// not specific to the COUNT function, and it can occur with any aggregate function, + /// such as SUM, AVG, etc. The anomaly arises because aggregates fail to distinguish + /// between an empty set and null values when optimizing a correlated query as a join. + /// Here, we use "the count bug" to refer to all such cases. Review Comment: Thank you @suibianwanwank -- this is great! I learned something new! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org