vbarua commented on code in PR #13931: URL: https://github.com/apache/datafusion/pull/13931#discussion_r1899786865
########## datafusion/substrait/src/logical_plan/producer.rs: ########## @@ -730,32 +1035,23 @@ fn to_substrait_named_struct(schema: &DFSchemaRef) -> Result<NamedStruct> { } fn to_substrait_join_expr( - state: &dyn SubstraitPlanningState, + producer: &mut impl SubstraitProducer, join_conditions: &Vec<(Expr, Expr)>, eq_op: Operator, - left_schema: &DFSchemaRef, - right_schema: &DFSchemaRef, - extensions: &mut Extensions, + join_schema: &DFSchemaRef, ) -> Result<Option<Expression>> { // Only support AND conjunction for each binary expression in join conditions let mut exprs: Vec<Expression> = vec![]; for (left, right) in join_conditions { - // Parse left - let l = to_substrait_rex(state, left, left_schema, 0, extensions)?; - // Parse right - let r = to_substrait_rex( - state, - right, - right_schema, - left_schema.fields().len(), // offset to return the correct index - extensions, - )?; + let l = producer.consume_expr(left, join_schema)?; + let r = producer.consume_expr(right, join_schema)?; Review Comment: > Does this change affect the produced substrait plan? Checking with the test I added, it does not actually. Taking a step back as well. When I was updating this code I was curious why the code in https://github.com/apache/datafusion/pull/6135 was added, and I may have misunderstood what it was trying to fix. What I did notice when refactoring this code was that we were already concatenating the schemas and then using that to convert the filter on the join https://github.com/apache/datafusion/blob/9b5995fa024d95c19e1270447e13f3c9dd428c69/datafusion/substrait/src/logical_plan/producer.rs#L465-L475 If it can be used to convert the filter, which itself can contain expressions referencing either side of the join, then it should be possible to use that schema to convert the expressions in the join condition as well. Based on this, I removed the column offset code. As I understand it, something like ```sql SELECT * FROM foo JOIN foo ON id = id ``` where `foo` has an `id` column, is rejected with ``` Schema error: Ambiguous reference to unqualified field table_name ``` If we qualify both side ```sql SELECT * FROM foo JOIN foo ON l.id = r.id ``` it works fine. If DataFusion rejects queries where the column name references is ambiguous, it should be possible to look up the column in the combined schema generally. You're work in https://github.com/apache/datafusion/pull/11049 made it possible to read plans where both sides of the join had columns with the same name, which would otherwise fail. That probably affected the testing code, but not the producer behaviour. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org