Re: [PR] Allow struct field access projections to be pushed down into scans [datafusion]

via GitHub Fri, 16 Jan 2026 15:03:11 -0800


adriangb commented on code in PR #19538:
URL: https://github.com/apache/datafusion/pull/19538#discussion_r2700282271



##########
datafusion/sqllogictest/test_files/limit.slt:
##########
@@ -846,10 +846,10 @@ logical_plan
 05)--------Sort: test_limit_with_partitions.part_key ASC NULLS LAST, fetch=1
 06)----------TableScan: test_limit_with_partitions projection=[part_key]
 physical_plan
-01)ProjectionExec: expr=[1 as foo]
-02)--SortPreservingMergeExec: [part_key@0 ASC NULLS LAST], fetch=1
-03)----SortExec: TopK(fetch=1), expr=[part_key@0 ASC NULLS LAST], 
preserve_partitioning=[true]
-04)------DataSourceExec: file_groups={3 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-0.parquet],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-1.parquet],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-2.parquet]]},
 projection=[part_key], file_type=parquet, predicate=DynamicFilter [ empty ]
+01)ProjectionExec: expr=[foo@0 as foo]
+02)--SortPreservingMergeExec: [part_key@1 ASC NULLS LAST], fetch=1
+03)----SortExec: TopK(fetch=1), expr=[part_key@1 ASC NULLS LAST], 
preserve_partitioning=[true]
+04)------DataSourceExec: file_groups={3 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-0.parquet],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-1.parquet],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit/test_limit_with_partitions/part-2.parquet]]},
 projection=[1 as foo, part_key], file_type=parquet, predicate=DynamicFilter [ 
empty ]

Review Comment:
   @alamb one issue I've run into is that it's hard to encode the logic of 
"don't push down literals if they're on their own or with just columns, but do 
push them down if they're arguments to a scalar function that will not create a 
RecordBatch from them" hard. I.e. `select 1 as foo, get_field(struct, 'another 
literal') ...`.
   
   I think the right thing to do here is to let `get_field` handle the 
"recursion". I.e. instead of our current logic of:
   
   ```rust
   // In ScalarFunctionExpr
   fn is_trivial(&self) -> bool {
       if !self.fun.is_trivial() {
           return false;
       }
       self.args.iter().all(|arg| arg.is_trivial())
   }
   ```
   
   Into:
   
   ```rust
   fn is_trivial(&self) -> bool {
       if !self.fun.is_trivial(&self.args) {
           return false;
       }
   }
   ```
   
   Or something like that. Realistically only the function knows if what it's 
going to do with the arguments is efficient or not.
   
   But there's two issues with this:
   1. We need to methods on `ScalarFunctionUDFImpl`, one for logical layer and 
one for physical (`is_trivial_logical(args: &[Expr])` and 
`is_trivial_physical(args: &[Arc<dyn PhysicalExpr>])` or something like that.
   2. `ScalarFunctionUDFImpl` can't even reference `PhysicalExpr` because of 
crate dependency cycles 😢 
   
   Any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Allow struct field access projections to be pushed down into scans [datafusion]

Reply via email to