jonahgao commented on PR #8124: URL: https://github.com/apache/arrow-datafusion/pull/8124#issuecomment-1806950843
The behaviors of `sum` and `count` are different when dealing with empty inputs. Re-writing `count` as `count`+`sum` could lead to incorrect results. I test on this branch: ```sh DataFusion CLI v33.0.0 ❯ create table t(a int, b int); 0 rows in set. Query took 0.016 seconds. ❯ select count(distinct a), count(b) from t; +---------------------+------------+ | COUNT(DISTINCT t.a) | COUNT(t.b) | +---------------------+------------+ | 0 | | +---------------------+------------+ 1 row in set. Query took 0.029 seconds. ``` `COUNT(t.b)` should be 0 and not NULL. Maybe we need a special `sum` function like `SqlSumEmptyIsZero` in calcite, or just skip the optimization for `count`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
