[ 
https://issues.apache.org/jira/browse/CALCITE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790390#comment-16790390
 ] 

Ruben Quesada Lopez edited comment on CALCITE-2898 at 3/12/19 9:53 AM:
-----------------------------------------------------------------------

Agree, probably the equijoin assertion needs to be added to some other places.
But considering the current discussion, maybe I'm missing something, but IMHO 
the current verification does not make much sense regarding RexFieldAccess, 
because the same expression will be / will not be considered as an equijoin, 
the "trick" will be if we previously perform a projection of the 
"expression.field" to get a RexInputRef for it, e.g.:
{code}
SELECT a1.* from authors a1 WHERE a1.birthPlace.country IN
  (SELECT a2.birthPlace.country FROM authors a2
   WHERE ...)
{code}

This semijoin will NOT work (condition not considered as equijoin condition):
{code:java}
relBuilder.scan("authors").as("a1");
relBuilder.scan("authors").as("a2");
relBuilder.filter(...);
RexNode condition = relBuilder.equals(
    relBuilder.field(relBuilder.field(2, 0, "birthPlace"), "country"),
    relBuilder.field(relBuilder.field(2, 1, "birthPlace"), "country"));
relBuilder.semiJoin(condition);  // NOT considered as equijoin
{code}

But this semijoin, on the same fields will work, just because we have performed 
a projection to be able to access 'country' via an "artificial" RexInputRef 
instead of the "default" RexFieldAccess.RexInputRef:
{code:java}
relBuilder.scan("authors").as("a1");
relBuilder.projectPlus(relBuilder.field(relBuilder.field("a1", "birthPlace"), 
"country")); 
// a1.birthPlace.country projected as a1's field 4
relBuilder.scan("authors").as("a2");
relBuilder.filter(...);
relBuilder.projectPlus(relBuilder.field(relBuilder.field("a2", "birthPlace"), 
"country")); 
// a2.birthPlace.country projected as a2's field 4
RexNode condition = relBuilder.equals(
    relBuilder.field(2, 0, 4),
    relBuilder.field(2, 1, 4));
relBuilder.semiJoin(condition);  // considered as equijoin
{code}

I think this behavioral difference is confusing and prevents us from defining 
valid semijoins on structs' fields easily (without having to perform an extra 
and, in my opinion, avoidable projection)


was (Author: rubenql):
Agree, probably the equijoin assertion needs to be added to some other places.
But considering the current discussion, maybe I'm missing something, but IMHO 
the current verification does not make much sense regarding RexFieldAccess, 
because the same expression will be / will not be considered as an equijoin, 
the "trick" will be if we previously perform a projection of the 
"expression.field" to get a RexInputRef for it, e.g.:
{code}
SELECT a1.* from authors a1 WHERE a1.birthPlace.country IN
  (SELECT a2.birthPlace.country FROM authors a2
   WHERE ...)
{code}

This semijoin will NOT work (condition not considered as equijoin condition):
{code:java}
relBuilder.scan("authors").as("a1");
relBuilder.scan("authors").as("a2");
relBuilder.filter(...);
RexNode condition = relBuilder.equals(
    relBuilder.field(relBuilder.field(2, 0, "birthPlace"), "country"),
    relBuilder.field(relBuilder.field(2, 1, "birthPlace"), "country"));
relBuilder.semiJoin(condition);  // NOT considered as equijoin
{code}

But this semijoin, on the same fields will work, just because we have performed 
a projection to be able to access 'country' via an "artificial" RexInputRef 
instead of the "default" RexFieldAccess.RexInputRef:
{code:java}
relBuilder.scan("authors").as("a1");
relBuilder.projectPlus(relBuilder.field(relBuilder.field("a1", "birthPlace"), 
"country")); 
// a1.birthPlace.country projected as a1's field 4
relBuilder.scan("authors").as("a2");
relBuilder.filter(...);
relBuilder.projectPlus(relBuilder.field(relBuilder.field("a2", "birthPlace"), 
"country")); 
// a2.birthPlace.country projected as a2's field 4
RexNode condition = relBuilder.equals(
    relBuilder.field(2, 0, 4),
    relBuilder.field(2, 1, 4));
relBuilder.semiJoin(condition);  // considered as equijoin
{code}

I think this behavioral difference is confusing and prevents us from defining 
valid semijoins on structs' fields easily (without having to perform an extra 
and, I think, avoidable projection)

> RelOptUtil#splitJoinCondition must consider RexFieldAccess referencing 
> RexInputRef
> ----------------------------------------------------------------------------------
>
>                 Key: CALCITE-2898
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2898
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>            Reporter: Ruben Quesada Lopez
>            Assignee: Ruben Quesada Lopez
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{RelOptUtil#splitJoinCondition}} "Splits out the equi-join components of a 
> join condition, and returns what's left (remaining join filters that are not 
> equijoins)". This works fine in case of RexInputRef operands in the condition 
> (e.g. $0 = $1), but if any of the operands is a RexFieldAccess referencing a 
> RexInputRef (e.g. $0 = $1.id), then the condition will NOT be detected as an 
> equi-join and will be returned as if it were a non-equijoin. 
> This can lead to undesired consequences, e.g {{JoinInfo#of}} would return a 
> NonEquiJoinInfo object instead of an EquiJoinInfo object, which can generate 
> problems if, for example, we are creating a SemiJoin (which requires an 
> EquiJoinInfo object)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to