Hello, In many research papers and database books the equi join is defined as an equality operator between two attribute names (columns etc.) which more or less corresponds to the current definition in Calcite thus I don't see an obvious reason to change this definition. If necessary, there could be rules/methods which create the necessary projections below the join to satisfy the requirement of an equi join.
I wouldn't consider Enumerable operators as an implementation detail that is specific to Calcite. Enumerable operators is good way to provide full SQL support to a project that does not naturally support every operation (think of various NoSQL databases which do not support joins for instance). This is exactly what we do in our adapters and I guess it will be similar for other people building adapters over various data sources. Even for projects where the plan is to provide native operators for everything having Enumerables at the beginning is convenient. It can help to kick-off the project fast without waiting for every operator to get implemented. Best, Stamatis On Tue, Apr 16, 2019 at 5:53 AM Yuzhao Chen <[email protected]> wrote: > Now many compute engine do not use Calcite EnumerableXXXs and only use the > logical node for planning, after all, the Enumerables are implementations > are only specific to Calcite, I still think Calcite need to give more > accurate definitions for what equi join is. > > Best, > Danny Chan > 在 2019年4月16日 +0800 AM12:19,Ruben Q L <[email protected]>,写道: > > Danny, > > I have seen the full picture and I have actually changed mind: > > > > If I am not mistaken, currently the way to make your example (and mine) > to > > work as an EquiJoin is using intermediate projections (so that RexCall > > / RexFieldAccess "becomes" RexInputRef): > > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b > > > > => option 1 (analyzed as equijoin) > > Project($0, $2) > > Join(condition: $1 = $2) -- i.e. cast(A.a as int) = B.b > > Project($0=a; $1=cast($0 as int)) > > Scan(A) > > Scan(B) > > > > => option 2 (analyzed as non-equijoin) > > Project($0, $1) > > Join(condition: cast($0 as int) = $1) -- i.e. cast(A.a as int) = B.b > > Scan(A) > > Scan(B) > > > > It might seem "wrong", but the thing is, the Enumerable implementations > > that extend EquiJoin (i.e. EnumerableJoin, EnumerableMergeJoin, > > EnumerableSemiJoin) are based on the EquiJoin fields: > > public final ImmutableIntList leftKeys; > > public final ImmutableIntList rightKeys; > > > > And rely on the the fact that they are representing an equality on > leftKeys > > and rightKeys field indices, and that we can directly generate accessors > > for these fields without any extra computation (i.e. without any extra > > call). That's the reason why EquiJoin cannot support RexCall > > / RexFieldAccess, because they cannot be translatable to a key (i.e. to a > > field index). > > > > With this situation, we could improve this logic to support more complex > > equijoin conditions; but I think this will not be worth it, because the > > alternative is quite simple: add a projection for the RexCall > > / RexFieldAccess and keep the existing (simple) logic. > > > > For this reason, I think we should stick to the current logic *an > equi-join > > is "field = field", not "expression = field" *and I should abandon and > > close https://issues.apache.org/jira/browse/CALCITE-2898 > > > > Best, > > Ruben > > > > > > Le lun. 15 avr. 2019 à 14:13, Yuzhao Chen <[email protected]> a > écrit : > > > > > Thx Ruben, the issue really answer my questions, I encounter this when > > > dong CALCITE-2969, when I refactor SemiJoinRule, I think not only > > > RexFieldAccess, any RexCall should fit into this case, only if the > RexCall > > > function is deterministic, what do you think ? > > > > > > Best, > > > Danny Chan > > > 在 2019年4月15日 +0800 PM7:48,Ruben Q L <[email protected]>,写道: > > > > Danny, > > > > In the context of https://issues.apache.org/jira/browse/CALCITE-2898, > a > > > > discussion about this topic was started. In that ticket I pointed out > > > that > > > > Calcite does not recognize "RexFieldAccess = RexInputRef" as an > EquiJoin > > > > condition (even though the RexFieldAccess itself is referencing a > > > > RexInputRef); which is somewhat similar to the situation that you > propose > > > > "RexCall = RexInputRef". According to Julian Hyde's comment on that > > > > ticket: *'For > > > > our purposes, an equi-join is "field = field", not "expression = > field". > > > > Even if that expression is a reference to sub-field'. *However, I > agree > > > > with you and maybe this definition should be reviewed (I believe your > > > > example and my example should be valid cases of EquiJoin), but > possibly > > > > this will break some pieces of the current code, so the modification > > > might > > > > not be straightforward. > > > > > > > > Best, > > > > Ruben > > > > > > > > > > > > Le lun. 15 avr. 2019 à 13:25, Xiening Dai <[email protected]> a > écrit : > > > > > > > > > I think Calcite always pushes down equal join conditions. In > > > > > SqlToRelConverter.createJoin(), before ruction returns, it calls > > > > > RelOptUtil.pushDownJoinConditions(). So in your example, the cast > > > > > expression will be pushed down and it will still be an equal join. > > > > > > > > > > > On Apr 15, 2019, at 5:40 PM, Yuzhao Chen <[email protected]> > > > wrote: > > > > > > > > > > > > If we checkout the java doc for Calcite EuqiJoin, there is > definition > > > > > for it: > > > > > > > for any join whose condition is based on column equality > > > > > > > > > > > > But what about if there are function calls in the equi condition > > > > > operands ? For example: > > > > > > Should we consider > > > > > > > > > > > > Select A.a, B.b from A join B on cast(A.a as int) = B.b > > > > > > > > > > > > as an equi join ? > > > > > > > > > > > > Now Calcite think it is not, which I think will lost some > > > possibilities > > > > > for sql plan promotion, e.g. join condition push down. > > > > > > > > > > > > Best, > > > > > > Danny Chan > > > > > > > > > > > > > >
