liukun4515 opened a new issue, #3583: URL: https://github.com/apache/arrow-datafusion/issues/3583
**Describe the bug** in this https://github.com/apache/arrow-datafusion/pull/3396 pr, I remove the type coercion in the physical phase. We need do the type coercion in the logical phase as much as early. But after do the type coercion, there is some bugs for this. For example: test case `tpch_explain_q10` the `ProjectionPushDown` will not work for this test case. I add some log to debug this, find the cause reason is the type and column name for this. ``` // aggregate: // * remove any aggregate expression that is not required // * construct the new set of required columns // Find distinct group by exprs in the case where we have a grouping set let all_group_expr: Vec<Expr> = grouping_set_to_exprlist(group_expr)?; exprlist_to_columns(&all_group_expr, &mut new_required_columns)?; warn!("the input plan schema {:?}", plan.schema().fields()); // Gather all columns needed for expressions in this Aggregate let mut new_aggr_expr = Vec::new(); aggr_expr.iter().try_for_each(|expr| { let name = &expr.name()?; let column = Column::from_name(name); warn!("debug iter agg expr: {} \n {} \n {} \n", expr, name, column); if required_columns.contains(&column) { new_aggr_expr.push(expr.clone()); new_required_columns.insert(column); // add to the new set of required columns expr_to_columns(expr, &mut new_required_columns) } else { Ok(()) } })?; warn!( "debug optimizer exprs: {:?} due to unexpected colums: {:?}, the new agg_expr: {:?}", aggr_expr, required_columns, new_aggr_expr ); ``` run the test ``` RUST_LOG=trace cargo test --package datafusion --test sql_integration sql::explain_analyze::tpch_explain_q10 -- --exact ``` Get the warn long ``` [2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] the input plan schema [DFField { qualifier: Some("customer"), field: Field { name: "c_custkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false , metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { na me: "c_acctbal", data_type: Decimal128(15, 2), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_phone", data_type: Utf8, nullable: false, dict_id: 0, d ict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("nation"), field: Field { name: "n_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer") , field: Field { name: "c_address", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field { name: "c_comment", data_type: Utf8, nullable: false, dict_ id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)", data_type: Decimal128(38, 4), nullable: true, dict_id: 0, dict_is_ordered : false, metadata: None } }] [2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] debug iter agg expr: SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D ecimal128(38, 4))) SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount) #SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount) [2022-09-22T02:27:25Z WARN datafusion_optimizer::projection_push_down] debug optimizer exprs: [SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) A S Decimal128(38, 4)))] due to unexpected colums: {Column { relation: Some("nation"), name: "n_name" }, Column { relation: Some("customer"), name: "c_name" }, Column { relation: None, name: "SUM(lineitem.l_extendedprice * Int64(1) - lineit em.l_discount)" }, Column { relation: None, name: "revenue" }, Column { relation: Some("customer"), name: "c_custkey" }, Column { relation: Some("customer"), name: "c_address" }, Column { relation: Some("customer"), name: "c_comment" }, C olumn { relation: Some("customer"), name: "c_acctbal" }, Column { relation: Some("customer"), name: "c_phone" }}, the new agg_expr: [] ``` From the log, we can see In the schema, a field is `Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)"` Bu the expr is `SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 2)) AS D ecimal128(38, 4)))` and the column is `#SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - lineitem.l_discount)` The difference is `Int64(1)` and `Decimal128(Some(100),23,2`, the decimal128 is got from the type coercion. **To Reproduce** Steps to reproduce the behavior: **Expected behavior** A clear and concise description of what you expected to happen. **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
