alamb commented on issue #18337:
URL: https://github.com/apache/datafusion/issues/18337#issuecomment-3474208733

   I spent some time today debugging this issue more
   - Here is a reproducer: https://github.com/apache/datafusion/pull/18412
   
   
   I added some more debugging and I found
   ```
        DataFusion error: Internal error: Internal error planning 
LogicalPlan::Aggregate. Physical input schema should be the same as the one 
converted from logical input schema. Differences:
        - field metadata at indexl 1 [prev_value]: (physica) {"metadata_key": 
"the nonnull_name field"} vs (logical) {}.
   ```
   
   You can see the actual input plans here:
   ```
   [2025-10-31T17:30:07Z DEBUG datafusion::physical_planner] Input plan 
Projection: table_with_metadata.nonnull_name AS value, 
lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC 
NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS prev_value
         WindowAggr: windowExpr=[[lag(table_with_metadata.nonnull_name) ORDER 
BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW]]
           TableScan: table_with_metadata projection=[ts, nonnull_name]
   
   [2025-10-31T17:30:07Z DEBUG datafusion::physical_planner] Resulting exec: 
ProjectionExec: expr=[nonnull_name@1 as value, 
lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC 
NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@2 as prev_value]
         BoundedWindowAggExec: wdw=[lag(table_with_metadata.nonnull_name) ORDER 
BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW: Field { "lag(table_with_metadata.nonnull_name) ORDER BY 
[table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW": nullable Utf8, metadata: {"metadata_key": "the nonnull_name 
field"} }, frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], 
mode=[Sorted]
           DataSourceExec: partitions=1, partition_sizes=[1]
   ```
   
   So that basically says the logical plan lost the metadata on the field but 
the physical plan preserves it which is sort of counter intuitive.
   
   However, then I looked at the code that determines the type of window 
functions in 
https://github.com/apache/datafusion/blob/4e0596d0d7a7598762e397778110a58bab5363b9/datafusion/expr/src/expr_schema.rs#L662-L661
 and I would say "its complicated"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to