[I] Projection pushdown for column narrowing expressions [datafusion]

via GitHub Thu, 18 Dec 2025 05:07:52 -0800


adriangb opened a new issue, #19387:
URL: https://github.com/apache/datafusion/issues/19387


   The projection pushdown optimizer rule / implementations generally only push 
down a projection if it "narrows" a schema (i.e. has less output expressions 
than input expressions) and the output expressions are all columns or literals:
   
   
https://github.com/apache/datafusion/blob/d68b629dc610972295d8f310b09cd854cf250dd3/datafusion/physical-plan/src/filter.rs#L470-L471
   
   
https://github.com/apache/datafusion/blob/d68b629dc610972295d8f310b09cd854cf250dd3/datafusion/physical-plan/src/repartition/mod.rs#L1045-L1055
   
   
https://github.com/apache/datafusion/blob/d68b629dc610972295d8f310b09cd854cf250dd3/datafusion/physical-plan/src/projection.rs#L255-L268
   
   This is problematic with a plan like:
   
   ```
   copy (
     select 1 as id, named_struct('large_string_field', 'big text!', 
'small_int_field', 2) as large_struct
   )
   TO 'struct.parquet';
   
   create external table t stored as parquet location 'struct.parquet';
   
   explain format indent
   select large_struct['small_int_field'] * 2 from t where id = 1; 
   ```
   
   ```
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: get_field(t.large_struct, 
Utf8("small_int_field")) * Int64(2)                                             
                                                 |
   |               |   Filter: t.id = Int64(1)                                  
                                                                                
                            |
   |               |     TableScan: t projection=[id, large_struct], 
partial_filters=[t.id = Int64(1)]                                               
                                       |
   | physical_plan | ProjectionExec: expr=[get_field(large_struct@0, 
small_int_field) * 2 as t.large_struct[small_int_field] * Int64(2)]             
                                       |
   |               |   CoalesceBatchesExec: target_batch_size=8192              
                                                                                
                            |
   |               |     FilterExec: id@0 = 1, projection=[large_struct@1]      
                                                                                
                            |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12), 
input_partitions=1                                                              
                              |
   |               |         DataSourceExec: file_groups={1 group: 
[[Users/adrian/GitHub/datafusion/struct.parquet]]}, projection=[id, 
large_struct], file_type=parquet, predicate=id@0 = 1 |
   |               |                                                            
                                                                                
                            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Projection pushdown for column narrowing expressions [datafusion]

Reply via email to