ahirner opened a new issue, #10935:
URL: https://github.com/apache/datafusion/issues/10935

   ### Is your feature request related to a problem or challenge?
   
   Currently where clauses in a sub query are not pushed below `Unnest` 
expressions. In conjunction with some other limitations, it composing views 
with `unnest()` not ideal.
   
   In `0.39.0` cli:
   ```
   > CREATE TABLE IF NOT EXISTS v AS VALUES(1,[1,2,3]),(2,[3,4,5]);
   0 row(s) fetched.
   Elapsed 0.008 seconds.
   ```
   
   We get equal results with or w/o a subquery:
   ```
   > select unnest(column2) from v where column1=2;
   +-------------------+
   | unnest(v.column2) |
   +-------------------+
   | 3                 |
   | 4                 |
   | 5                 |
   +-------------------+
   3 row(s) fetched.
   Elapsed 0.011 seconds.
   ```
   
   ```
   > select "unnest(v.column2)" from (select unnest(column2), column1 from v) 
where column1=2;
   +-------------------+
   | unnest(v.column2) |
   +-------------------+
   | 3                 |
   | 4                 |
   | 5                 |
   +-------------------+
   3 row(s) fetched.
   Elapsed 0.015 seconds.
   ```
   
   `Filter` and `FilterExec` are above projections with `unnest`, instead of 
below.
   Ok:
   ```
   > explain select unnest(column2) from v where column1=2;
   
+---------------+------------------------------------------------------------------------+
   | plan_type     | plan                                                       
            |
   
+---------------+------------------------------------------------------------------------+
   | logical_plan  | Unnest: lists[unnest(v.column2)] structs[]                 
            |
   |               |   Projection: v.column2 AS unnest(v.column2)               
            |
   |               |     Filter: v.column1 = Int64(2)                           
            |
   |               |       TableScan: v projection=[column1, column2]           
            |
   | physical_plan | UnnestExec                                                 
            |
   |               |   RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1 |
   |               |     ProjectionExec: expr=[column2@1 as unnest(v.column2)]  
            |
   |               |       CoalesceBatchesExec: target_batch_size=8192          
            |
   |               |         FilterExec: column1@0 = 2                          
            |
   |               |           MemoryExec: partitions=1, partition_sizes=[1]    
            |
   |               |                                                            
            |
   
+---------------+------------------------------------------------------------------------+
   ```
   
   Problematic:
   ```
   > explain select "unnest(v.column2)" from (select unnest(column2), column1 
from v) where column1=2;
   
+---------------+---------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                           |
   
+---------------+---------------------------------------------------------------------------------------+
   | logical_plan  | Projection: unnest(v.column2)                              
                           |
   |               |   Filter: v.column1 = Int64(2)                             
                           |
   |               |     Unnest: lists[unnest(v.column2)] structs[]             
                           |
   |               |       Projection: v.column2 AS unnest(v.column2), 
v.column1                           |
   |               |         TableScan: v projection=[column1, column2]         
                           |
   | physical_plan | ProjectionExec: expr=[unnest(v.column2)@0 as 
unnest(v.column2)]                       |
   |               |   CoalesceBatchesExec: target_batch_size=8192              
                           |
   |               |     FilterExec: column1@1 = 2                              
                           |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1            |
   |               |         UnnestExec                                         
                           |
   |               |           ProjectionExec: expr=[column2@1 as 
unnest(v.column2), column1@0 as column1] |
   |               |             MemoryExec: partitions=1, partition_sizes=[1]  
                           |
   |               |                                                            
                           |
   
+---------------+---------------------------------------------------------------------------------------+
   ```
   
   
   ### Describe the solution you'd like
   
   `FilterExec` is pushed down to tables that support them.
   
   ### Describe alternatives you've considered
   
   A workaround is trying to not use subqueries. However, then one cannot unest 
structs with multiple lists, or unnest a column more than once. Those are also 
unsupported.
   
   ```
   > select unnest(unnest([[1],[1,2],[1,2,3]]));
   type_coercion
   caused by
   This feature is not implemented: Unnest should be rewritten to 
LogicalPlan::Unnest before type coercion
   ```
   
   ```
   > select unnest(column2) as x, unnest(column2) as y from v;
   Error during planning: Projections require unique expression names but the 
expression "v.column2 AS unnest(v.column2)" at position 0 and "v.column2 AS 
unnest(v.column2)" at position 1 have the same name. Consider aliasing ("AS") 
one of them.
   ```
   (note: a real case would be something like `select 
unnest(named_struct)['foo'] as foo, unnest(named_struct)['bar'] as bar`).
   
   ### Additional context
   
   Maybe related: #5364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to