kosiew opened a new pull request, #16258: URL: https://github.com/apache/datafusion/pull/16258
## Which issue does this PR close? Closes #16228 <!-- Replace <ISSUE_NUMBER> with the actual GitHub issue number once known --> ## Rationale for this change `Array::is_null` does not correctly identify nulls for `DictionaryArray` when the indices point to nulls in the values array. This causes incorrect results in aggregation queries such as `count(distinct ...)`, which should skip nulls but currently may include them due to improper null handling. The change ensures nulls in dictionary values are correctly detected and excluded. [Arrow's hands are tied on this matter](https://github.com/apache/arrow-rs/pull/7608) and so we are fixing the issue in this repo. ## What changes are included in this PR? - Updated the logic in `DistinctCountAccumulator` to use `ScalarValue::is_null()` instead of relying solely on `Array::is_null()` for determining null entries. - Added SQL logic tests to confirm correct behavior when `DictionaryArray` contains only null values. ## Are these changes tested? Yes, tests have been added to `sqllogictest/test_files/aggregate.slt` to verify that `count(distinct ...)` correctly returns `0` when all dictionary values are null. These tests cover both query logic and table lifecycle (create/drop). ## Are there any user-facing changes? Yes. This change corrects the results of `count(distinct ...)` queries involving `DictionaryArray` columns with nulls in the value array. Users can now expect consistent and correct results across different partition settings and query plans. <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org