xudong963 edited a comment on issue #1064: URL: https://github.com/apache/arrow-datafusion/issues/1064#issuecomment-937813055
Bug located at https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/planner.rs#L836 `input_dfschema` is from the logical input schema, so idx of the column is from the logical input schema. The idx is wrapped in physical expr and is used in https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/type_coercion.rs#L56 Pay attention to the `schema`, which is from the physical input schema. So when the size of the logical input schema is different from the size of the physical input schema, the bug appears. The direct way from my brain is to get the idx of the column from the physical input schema, `let idx = input_schema.index_of(c.name.as_str())?;`. But sometimes column, logical input schema field name, and physical input schema field name are not same, such as the following case: ```sql select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1995-01-01' and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 24; ``` ```rust [datafusion/src/physical_plan/planner.rs:836] c = Column { relation: None, name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)", } [datafusion/src/physical_plan/planner.rs:837] input_dfschema = DFSchema { fields: [ DFField { qualifier: None, field: Field { name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None, }, }, ], } [datafusion/src/physical_plan/planner.rs:838] input_schema = Schema { fields: [ Field { name: "SUM(lineitem.l_extendedprice Multiply lineitem.l_discount)", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None, }, ], metadata: {}, } ``` The second way is to wrap the union logical plan into a projection plan, but maybe the logical plan will be optimized. For the case mentioned by @Dandandan, the projection plan wrapped on the union logical plan will be optimized and only contains `d`. So finally there is still a bug... Please give me some suggestions about the situation, thanks! @alamb @Dandandan @houqp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
