ahirner opened a new issue, #10935: URL: https://github.com/apache/datafusion/issues/10935
### Is your feature request related to a problem or challenge? Currently where clauses in a sub query are not pushed below `Unnest` expressions. In conjunction with some other limitations, it composing views with `unnest()` not ideal. In `0.39.0` cli: ``` > CREATE TABLE IF NOT EXISTS v AS VALUES(1,[1,2,3]),(2,[3,4,5]); 0 row(s) fetched. Elapsed 0.008 seconds. ``` We get equal results with or w/o a subquery: ``` > select unnest(column2) from v where column1=2; +-------------------+ | unnest(v.column2) | +-------------------+ | 3 | | 4 | | 5 | +-------------------+ 3 row(s) fetched. Elapsed 0.011 seconds. ``` ``` > select "unnest(v.column2)" from (select unnest(column2), column1 from v) where column1=2; +-------------------+ | unnest(v.column2) | +-------------------+ | 3 | | 4 | | 5 | +-------------------+ 3 row(s) fetched. Elapsed 0.015 seconds. ``` `Filter` and `FilterExec` are above projections with `unnest`, instead of below. Ok: ``` > explain select unnest(column2) from v where column1=2; +---------------+------------------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------------------+ | logical_plan | Unnest: lists[unnest(v.column2)] structs[] | | | Projection: v.column2 AS unnest(v.column2) | | | Filter: v.column1 = Int64(2) | | | TableScan: v projection=[column1, column2] | | physical_plan | UnnestExec | | | RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1 | | | ProjectionExec: expr=[column2@1 as unnest(v.column2)] | | | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 = 2 | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+------------------------------------------------------------------------+ ``` Problematic: ``` > explain select "unnest(v.column2)" from (select unnest(column2), column1 from v) where column1=2; +---------------+---------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------------------------------------------+ | logical_plan | Projection: unnest(v.column2) | | | Filter: v.column1 = Int64(2) | | | Unnest: lists[unnest(v.column2)] structs[] | | | Projection: v.column2 AS unnest(v.column2), v.column1 | | | TableScan: v projection=[column1, column2] | | physical_plan | ProjectionExec: expr=[unnest(v.column2)@0 as unnest(v.column2)] | | | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@1 = 2 | | | RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1 | | | UnnestExec | | | ProjectionExec: expr=[column2@1 as unnest(v.column2), column1@0 as column1] | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+---------------------------------------------------------------------------------------+ ``` ### Describe the solution you'd like `FilterExec` is pushed down to tables that support them. ### Describe alternatives you've considered A workaround is trying to not use subqueries. However, then one cannot unest structs with multiple lists, or unnest a column more than once. Those are also unsupported. ``` > select unnest(unnest([[1],[1,2],[1,2,3]])); type_coercion caused by This feature is not implemented: Unnest should be rewritten to LogicalPlan::Unnest before type coercion ``` ``` > select unnest(column2) as x, unnest(column2) as y from v; Error during planning: Projections require unique expression names but the expression "v.column2 AS unnest(v.column2)" at position 0 and "v.column2 AS unnest(v.column2)" at position 1 have the same name. Consider aliasing ("AS") one of them. ``` (note: a real case would be something like `select unnest(named_struct)['foo'] as foo, unnest(named_struct)['bar'] as bar`). ### Additional context Maybe related: #5364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org