Now many compute engine do not use Calcite EnumerableXXXs and only use the logical node for planning, after all, the Enumerables are implementations are only specific to Calcite, I still think Calcite need to give more accurate definitions for what equi join is.
Best, Danny Chan 在 2019年4月16日 +0800 AM12:19,Ruben Q L <[email protected]>,写道: > Danny, > I have seen the full picture and I have actually changed mind: > > If I am not mistaken, currently the way to make your example (and mine) to > work as an EquiJoin is using intermediate projections (so that RexCall > / RexFieldAccess "becomes" RexInputRef): > > Select A.a, B.b from A join B on cast(A.a as int) = B.b > > => option 1 (analyzed as equijoin) > Project($0, $2) > Join(condition: $1 = $2) -- i.e. cast(A.a as int) = B.b > Project($0=a; $1=cast($0 as int)) > Scan(A) > Scan(B) > > => option 2 (analyzed as non-equijoin) > Project($0, $1) > Join(condition: cast($0 as int) = $1) -- i.e. cast(A.a as int) = B.b > Scan(A) > Scan(B) > > It might seem "wrong", but the thing is, the Enumerable implementations > that extend EquiJoin (i.e. EnumerableJoin, EnumerableMergeJoin, > EnumerableSemiJoin) are based on the EquiJoin fields: > public final ImmutableIntList leftKeys; > public final ImmutableIntList rightKeys; > > And rely on the the fact that they are representing an equality on leftKeys > and rightKeys field indices, and that we can directly generate accessors > for these fields without any extra computation (i.e. without any extra > call). That's the reason why EquiJoin cannot support RexCall > / RexFieldAccess, because they cannot be translatable to a key (i.e. to a > field index). > > With this situation, we could improve this logic to support more complex > equijoin conditions; but I think this will not be worth it, because the > alternative is quite simple: add a projection for the RexCall > / RexFieldAccess and keep the existing (simple) logic. > > For this reason, I think we should stick to the current logic *an equi-join > is "field = field", not "expression = field" *and I should abandon and > close https://issues.apache.org/jira/browse/CALCITE-2898 > > Best, > Ruben > > > Le lun. 15 avr. 2019 à 14:13, Yuzhao Chen <[email protected]> a écrit : > > > Thx Ruben, the issue really answer my questions, I encounter this when > > dong CALCITE-2969, when I refactor SemiJoinRule, I think not only > > RexFieldAccess, any RexCall should fit into this case, only if the RexCall > > function is deterministic, what do you think ? > > > > Best, > > Danny Chan > > 在 2019年4月15日 +0800 PM7:48,Ruben Q L <[email protected]>,写道: > > > Danny, > > > In the context of https://issues.apache.org/jira/browse/CALCITE-2898, a > > > discussion about this topic was started. In that ticket I pointed out > > that > > > Calcite does not recognize "RexFieldAccess = RexInputRef" as an EquiJoin > > > condition (even though the RexFieldAccess itself is referencing a > > > RexInputRef); which is somewhat similar to the situation that you propose > > > "RexCall = RexInputRef". According to Julian Hyde's comment on that > > > ticket: *'For > > > our purposes, an equi-join is "field = field", not "expression = field". > > > Even if that expression is a reference to sub-field'. *However, I agree > > > with you and maybe this definition should be reviewed (I believe your > > > example and my example should be valid cases of EquiJoin), but possibly > > > this will break some pieces of the current code, so the modification > > might > > > not be straightforward. > > > > > > Best, > > > Ruben > > > > > > > > > Le lun. 15 avr. 2019 à 13:25, Xiening Dai <[email protected]> a écrit : > > > > > > > I think Calcite always pushes down equal join conditions. In > > > > SqlToRelConverter.createJoin(), before ruction returns, it calls > > > > RelOptUtil.pushDownJoinConditions(). So in your example, the cast > > > > expression will be pushed down and it will still be an equal join. > > > > > > > > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <[email protected]> > > wrote: > > > > > > > > > > If we checkout the java doc for Calcite EuqiJoin, there is definition > > > > for it: > > > > > > for any join whose condition is based on column equality > > > > > > > > > > But what about if there are function calls in the equi condition > > > > operands ? For example: > > > > > Should we consider > > > > > > > > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b > > > > > > > > > > as an equi join ? > > > > > > > > > > Now Calcite think it is not, which I think will lost some > > possibilities > > > > for sql plan promotion, e.g. join condition push down. > > > > > > > > > > Best, > > > > > Danny Chan > > > > > > > > > >
