This is an automated email from the ASF dual-hosted git repository. github-merge-queue[bot] pushed a commit to branch gh-readonly-queue/main/pr-22074-484bddbd95d1ad1c67553b8b6004e4d7397f77bd in repository https://gitbox.apache.org/repos/asf/datafusion.git
commit ac29ab619fc17d88146716b05b625579a4fbbc6b Author: kosiew <[email protected]> AuthorDate: Wed May 13 04:10:15 2026 +0800 Add sqllogictest coverage for unused UNNEST pruning edge cases (#22074) ## Which issue does this PR close? * Part of #20118 ## Rationale for this change This PR adds test-only coverage documenting the current optimization gap around unused `UNNEST` outputs. The new tests capture a case where the unnested column becomes duplicate-insensitive under a `GROUP BY`, while also documenting counterexamples where removing `UNNEST` would incorrectly change row cardinality or null/empty-array semantics. These tests are intended to guide future optimizer work without changing current behavior. ## What changes are included in this PR? * Added a regression dataset in `sqllogictest/test_files/unnest.slt`. * Added a reproducer showing an unused `UNNEST` output under `GROUP BY`. * Added `EXPLAIN` assertions documenting that the current logical and physical plans still contain `Unnest` / `UnnestExec`. * Added counterexamples demonstrating cases where removing `UNNEST` would change result cardinality. * Added coverage for empty and `NULL` array semantics to document current select-list `UNNEST` behavior. * Added cleanup for the temporary test table. ## Are these changes tested? Yes. This PR adds SQL logic tests in `datafusion/sqllogictest/test_files/unnest.slt`, including: * A reproducer for unused `UNNEST` output below `GROUP BY` * `EXPLAIN` plan assertions for `Unnest` and `UnnestExec` * Cardinality-sensitive counterexamples * Empty/NULL array semantic coverage ## Are there any user-facing changes? No. This PR only adds tests and documentation of current behavior; it does not change optimizer behavior or query semantics. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested. --- datafusion/sqllogictest/test_files/unnest.slt | 85 +++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/datafusion/sqllogictest/test_files/unnest.slt b/datafusion/sqllogictest/test_files/unnest.slt index ac01b7d1f7..faeb5d5957 100644 --- a/datafusion/sqllogictest/test_files/unnest.slt +++ b/datafusion/sqllogictest/test_files/unnest.slt @@ -1334,3 +1334,88 @@ select unnest(arrow_cast(['a','b','c'], 'LargeListView(Utf8)')); a b c + +############################################### +# Unused UNNEST output projection-pruning cases # +############################################### + +statement ok +CREATE TABLE unused_unnest_pruning AS +SELECT * FROM ( + VALUES + (1, make_array(10, 20)), + (2, make_array()), + (3, arrow_cast(NULL, 'List(Int64)')) +) AS t(id, arr); + +# Reproducer for the optimization gap: the unused `elem` output is duplicate-insensitive +# below this GROUP BY, but the current plan still keeps Unnest/UnnestExec. +query I +SELECT id +FROM ( + SELECT id, UNNEST(make_array(1, 2, 3)) AS elem + FROM unused_unnest_pruning +) +GROUP BY id +ORDER BY id; +---- +1 +2 +3 + +query TT +EXPLAIN SELECT id +FROM ( + SELECT id, UNNEST(make_array(1, 2, 3)) AS elem + FROM unused_unnest_pruning +) +GROUP BY id; +---- +logical_plan +<slt:ignore>Unnest:<slt:ignore> +physical_plan +<slt:ignore>UnnestExec<slt:ignore> + +# Counterexample: removing UNNEST here would change cardinality. +query I rowsort +SELECT id +FROM ( + SELECT id, UNNEST(make_array(1, 2, 3)) AS elem + FROM unused_unnest_pruning +); +---- +1 +1 +1 +2 +2 +2 +3 +3 +3 + +# Counterexample with data-dependent empty/null arrays: current select-list +# UNNEST semantics drop rows for the empty and NULL arrays in this dataset. +# This documents the preserve-nulls/cardinality contract that follow-up optimizer +# rules must preserve; removing UNNEST here would return ids 2 and 3 too. +query II rowsort +SELECT id, elem +FROM ( + SELECT id, UNNEST(arr) AS elem + FROM unused_unnest_pruning +); +---- +1 10 +1 20 + +query I +SELECT COUNT(*) +FROM ( + SELECT id, UNNEST(arr) AS elem + FROM unused_unnest_pruning +); +---- +2 + +statement ok +DROP TABLE unused_unnest_pruning; --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
