Michael-J-Ward commented on PR #867: URL: https://github.com/apache/datafusion-python/pull/867#issuecomment-2342254957
Alright - I've narrowed down the bug. I'm fairly certain this is upstream but haven't reproduced it in Rust yet. I have a commit here that re-implements upstream `with_column` and adds a bunch of debug prints. https://github.com/Michael-J-Ward/datafusion-python/commit/6014e8c33d4a51a395f2ac2e149daf7156695b61 The thing to notice in this log is that `adding window function with alias` gets printed twice. I **think** the simplified version should be any ```rust df .with_column("foo", <normal expr>) .with_column("bar", <window expr>) ``` ```console adding column: "total_value" with expr: PyExpr { expr: WindowFunction(WindowFunction { fun: AggregateUDF(AggregateUDF { inner: Sum { signature: Signature { type_signature: UserDefined, volatility: Immutable } } }), args: [Column(Column { relation: None, name: "value" })], partition_by: [], order_by: [], window_frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false }, null_treatment: None }) } window_func_exprs: [WindowFunction(WindowFunction { fun: AggregateUDF(AggregateUDF { inner: Sum { signature: Signature { type_signature: UserDefined, volatility: Immutable } } }), args: [Column(Column { relation: None, name: "value" })], partition_by: [], order_by: [], window_frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false }, null_treatment: None })] col_exists: true, window_func: true plan: WindowAggr: windowExpr=[[sum(value) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] Aggregate: groupBy=[[?table?.ps_partkey]], aggr=[[sum(value) AS value]] Projection: ?table?.n_nationkey, ?table?.n_name, ?table?.s_suppkey, ?table?.s_nationkey, ?table?.ps_supplycost, ?table?.ps_availqty, ?table?.ps_suppkey, ?table?.ps_partkey, ?table?.ps_supplycost * ?table?.ps_availqty AS value Inner Join: ?table?.s_suppkey = ?table?.ps_suppkey Inner Join: ?table?.n_nationkey = ?table?.s_nationkey Filter: ?table?.n_name = Utf8("GERMANY") Projection: ?table?.n_nationkey, ?table?.n_name TableScan: ?table? Projection: ?table?.s_suppkey, ?table?.s_nationkey TableScan: ?table? Projection: ?table?.ps_supplycost, ?table?.ps_availqty, ?table?.ps_suppkey, ?table?.ps_partkey TableScan: ?table? qualifier: Some(Bare { table: "?table?" }), field: Field { name: "ps_partkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } adding column qualifier: None, field: Field { name: "value", data_type: Decimal128(36, 2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } adding window function with alias qualifier: None, field: Field { name: "sum(value) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: Decimal128(38, 2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } adding window function with alias col exists - not pushing Alias(Alias { expr: WindowFunction(WindowFunction { fun: AggregateUDF(AggregateUDF { inner: Sum { signature: Signature { type_signature: UserDefined, volatility: Immutable } } }), args: [Column(Column { relation: None, name: "value" })], partition_by: [], order_by: [], window_frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false }, null_treatment: None }), relation: None, name: "total_value" }) Traceback (most recent call last): File "/home/mike/workspace/datafusion-python/dev/examples/tpch/q11_important_stock_identification.py", line 70, in <module> df = df.with_column( ^^^^^^^^^^^^^^^ File "/home/mike/workspace/datafusion-python/dev/python/datafusion/dataframe.py", line 164, in with_column return DataFrame(self.df.with_column(name, expr.expr)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: Error during planning: Projections require unique expression names but the expression "value AS total_value" at position 1 and "sum(value) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS total_value" at position 2 have the same name. Consider aliasing ("AS") one of them. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
