alamb commented on issue #18337: URL: https://github.com/apache/datafusion/issues/18337#issuecomment-3474208733
I spent some time today debugging this issue more - Here is a reproducer: https://github.com/apache/datafusion/pull/18412 I added some more debugging and I found ``` DataFusion error: Internal error: Internal error planning LogicalPlan::Aggregate. Physical input schema should be the same as the one converted from logical input schema. Differences: - field metadata at indexl 1 [prev_value]: (physica) {"metadata_key": "the nonnull_name field"} vs (logical) {}. ``` You can see the actual input plans here: ``` [2025-10-31T17:30:07Z DEBUG datafusion::physical_planner] Input plan Projection: table_with_metadata.nonnull_name AS value, lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS prev_value WindowAggr: windowExpr=[[lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]] TableScan: table_with_metadata projection=[ts, nonnull_name] [2025-10-31T17:30:07Z DEBUG datafusion::physical_planner] Resulting exec: ProjectionExec: expr=[nonnull_name@1 as value, lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@2 as prev_value] BoundedWindowAggExec: wdw=[lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { "lag(table_with_metadata.nonnull_name) ORDER BY [table_with_metadata.ts ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW": nullable Utf8, metadata: {"metadata_key": "the nonnull_name field"} }, frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted] DataSourceExec: partitions=1, partition_sizes=[1] ``` So that basically says the logical plan lost the metadata on the field but the physical plan preserves it which is sort of counter intuitive. However, then I looked at the code that determines the type of window functions in https://github.com/apache/datafusion/blob/4e0596d0d7a7598762e397778110a58bab5363b9/datafusion/expr/src/expr_schema.rs#L662-L661 and I would say "its complicated" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
