kosiew opened a new pull request, #16466: URL: https://github.com/apache/datafusion/pull/16466
## Which issue does this PR close? - Closes #16266 ## Rationale for this change This change addresses a bug where `combine_hashes` was applied even if a dictionary value was null, leading to incorrect hash computations. This was discovered while investigating #16266 Additionally, this PR extends the test coverage for aggregate functions to better validate behavior with dictionary arrays containing nulls. ## What changes are included in this PR? - Fixes logic in `hash_dictionary` to ensure `combine_hashes` is only applied when the dictionary value is valid. - Corrects grammar in error messages for dataset generation expectations. - Enables null value generation in fuzz tests for dictionary arrays. - Adds comprehensive tests for aggregate functions (`COUNT`, `SUM`, `MIN`, `MAX`, `MEDIAN`, `FIRST_VALUE`, `LAST_VALUE`) using dictionary arrays with null keys and values. - Ensures consistent behavior across single and multi-partition execution. ## Are these changes tested? Yes, extensive new tests are added covering: - Aggregates on dictionary columns with null keys/values. - Window functions with null handling (IGNORE/RESPECT NULLS). - Partitioned vs. unpartitioned execution consistency. ## Are there any user-facing changes? No direct API changes, but query behavior involving dictionary arrays with nulls will now produce correct and consistent results in line with SQL semantics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org