This is an automated email from the ASF dual-hosted git repository.

github-merge-queue[bot] pushed a commit to branch 
gh-readonly-queue/main/pr-22074-484bddbd95d1ad1c67553b8b6004e4d7397f77bd
in repository https://gitbox.apache.org/repos/asf/datafusion.git

commit ac29ab619fc17d88146716b05b625579a4fbbc6b
Author: kosiew <[email protected]>
AuthorDate: Wed May 13 04:10:15 2026 +0800

    Add sqllogictest coverage for unused UNNEST pruning edge cases (#22074)
    
    ## Which issue does this PR close?
    
    * Part of #20118
    
    ## Rationale for this change
    
    This PR adds test-only coverage documenting the current optimization gap
    around unused `UNNEST` outputs.
    
    The new tests capture a case where the unnested column becomes
    duplicate-insensitive under a `GROUP BY`, while also documenting
    counterexamples where removing `UNNEST` would incorrectly change row
    cardinality or null/empty-array semantics. These tests are intended to
    guide future optimizer work without changing current behavior.
    
    ## What changes are included in this PR?
    
    * Added a regression dataset in `sqllogictest/test_files/unnest.slt`.
    * Added a reproducer showing an unused `UNNEST` output under `GROUP BY`.
    * Added `EXPLAIN` assertions documenting that the current logical and
    physical plans still contain `Unnest` / `UnnestExec`.
    * Added counterexamples demonstrating cases where removing `UNNEST`
    would change result cardinality.
    * Added coverage for empty and `NULL` array semantics to document
    current select-list `UNNEST` behavior.
    * Added cleanup for the temporary test table.
    
    ## Are these changes tested?
    
    Yes.
    
    This PR adds SQL logic tests in
    `datafusion/sqllogictest/test_files/unnest.slt`, including:
    
    * A reproducer for unused `UNNEST` output below `GROUP BY`
    * `EXPLAIN` plan assertions for `Unnest` and `UnnestExec`
    * Cardinality-sensitive counterexamples
    * Empty/NULL array semantic coverage
    
    ## Are there any user-facing changes?
    
    No. This PR only adds tests and documentation of current behavior; it
    does not change optimizer behavior or query semantics.
    
    ## LLM-generated code disclosure
    
    This PR includes LLM-generated code and comments. All LLM-generated
    content has been manually reviewed and tested.
---
 datafusion/sqllogictest/test_files/unnest.slt | 85 +++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/datafusion/sqllogictest/test_files/unnest.slt 
b/datafusion/sqllogictest/test_files/unnest.slt
index ac01b7d1f7..faeb5d5957 100644
--- a/datafusion/sqllogictest/test_files/unnest.slt
+++ b/datafusion/sqllogictest/test_files/unnest.slt
@@ -1334,3 +1334,88 @@ select unnest(arrow_cast(['a','b','c'], 
'LargeListView(Utf8)'));
 a
 b
 c
+
+###############################################
+# Unused UNNEST output projection-pruning cases #
+###############################################
+
+statement ok
+CREATE TABLE unused_unnest_pruning AS
+SELECT * FROM (
+    VALUES
+        (1, make_array(10, 20)),
+        (2, make_array()),
+        (3, arrow_cast(NULL, 'List(Int64)'))
+) AS t(id, arr);
+
+# Reproducer for the optimization gap: the unused `elem` output is 
duplicate-insensitive
+# below this GROUP BY, but the current plan still keeps Unnest/UnnestExec.
+query I
+SELECT id
+FROM (
+    SELECT id, UNNEST(make_array(1, 2, 3)) AS elem
+    FROM unused_unnest_pruning
+)
+GROUP BY id
+ORDER BY id;
+----
+1
+2
+3
+
+query TT
+EXPLAIN SELECT id
+FROM (
+    SELECT id, UNNEST(make_array(1, 2, 3)) AS elem
+    FROM unused_unnest_pruning
+)
+GROUP BY id;
+----
+logical_plan
+<slt:ignore>Unnest:<slt:ignore>
+physical_plan
+<slt:ignore>UnnestExec<slt:ignore>
+
+# Counterexample: removing UNNEST here would change cardinality.
+query I rowsort
+SELECT id
+FROM (
+    SELECT id, UNNEST(make_array(1, 2, 3)) AS elem
+    FROM unused_unnest_pruning
+);
+----
+1
+1
+1
+2
+2
+2
+3
+3
+3
+
+# Counterexample with data-dependent empty/null arrays: current select-list
+# UNNEST semantics drop rows for the empty and NULL arrays in this dataset.
+# This documents the preserve-nulls/cardinality contract that follow-up 
optimizer
+# rules must preserve; removing UNNEST here would return ids 2 and 3 too.
+query II rowsort
+SELECT id, elem
+FROM (
+    SELECT id, UNNEST(arr) AS elem
+    FROM unused_unnest_pruning
+);
+----
+1 10
+1 20
+
+query I
+SELECT COUNT(*)
+FROM (
+    SELECT id, UNNEST(arr) AS elem
+    FROM unused_unnest_pruning
+);
+----
+2
+
+statement ok
+DROP TABLE unused_unnest_pruning;


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to