[ 
https://issues.apache.org/jira/browse/CALCITE-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L updated CALCITE-4995:
-------------------------------
    Description: 
It seems {{RelFieldTrimmer}} can cause an {{AssertionError}} (or later an 
{{ArrayIndexOutOfBoundsException}} if assertions are disabled) on certain plans 
involving SEMI/ANTI join (i.e. joins that do NOT project the RHS fields).
The root cause seems to be the "early return" in 
{{RelFieldTrimmer#trimFields(Join join, ImmutableBitSet fieldsUsed, 
Set<RelDataTypeField> extraFields)}} when nothing has been trimmed inside 
join's inputs (so the join itself can be return as it is):
{code:java}
    if (changeCount == 0
        && mapping.isIdentity()) {
      return result(join, Mappings.createIdentity(fieldCount));
    }
{code}
The problem is that this {{fieldCount}} is an addition of LHS + RHS fields (+ 
system fields); but in case of a SEMI/ANTI the mappings to be returned must not 
consider RHS fields (since they are not projected by these join types).

The problem only happens here (when the trimmer does not trim the join). Notice 
that, a few lines below, in the "other return scenario" of the method (when 
something has been trimmed), there is a special treatment of the mapping for 
ANTI/SEMI, so things will work fine in this case:
{code:java}
    switch (join.getJoinType()) {
    case SEMI:
    case ANTI:
      // For SemiJoins and AntiJoins only map fields from the left-side
      if (join.getJoinType() == JoinRelType.SEMI) {
        relBuilder.semiJoin(newConditionExpr);
      } else {
        relBuilder.antiJoin(newConditionExpr);
      }
      Mapping inputMapping = inputMappings.get(0);
      mapping = Mappings.create(MappingType.INVERSE_SURJECTION,
          join.getRowType().getFieldCount(),
          newSystemFieldCount + inputMapping.getTargetCount());
      for (int i = 0; i < newSystemFieldCount; ++i) {
        mapping.set(i, i);
      }
      offset = systemFieldCount;
      newOffset = newSystemFieldCount;
      for (IntPair pair : inputMapping) {
        mapping.set(pair.source + offset, pair.target + newOffset);
      }
      break;
    default:
      relBuilder.join(join.getJoinType(), newConditionExpr);
    }
    relBuilder.hints(join.getHints());
    return result(relBuilder.build(), mapping);
{code}

  was:
(Unit test to be provided)

It seems {{RelFieldTrimmer}} can cause an {{AssertionError}} (or later an 
{{ArrayIndexOutOfBoundsException}} if assertions are disabled) on certain plans 
involving SEMI/ANTI join (i.e. joins that do NOT project the RHS fields).
The root cause seems to be the "early return" in 
{{RelFieldTrimmer#trimFields(Join join, ImmutableBitSet fieldsUsed, 
Set<RelDataTypeField> extraFields)}} when nothing has been trimmed inside 
join's inputs (so the join itself can be return as it is):
{code:java}
    if (changeCount == 0
        && mapping.isIdentity()) {
      return result(join, Mappings.createIdentity(fieldCount));
    }
{code}
The problem is that this {{fieldCount}} is an addition of LHS + RHS fields (+ 
system fields); but in case of a SEMI/ANTI the mappings to be returned must not 
consider RHS fields (since they are not projected by these join types).

The problem only happens here (when the trimmer does not trim the join). Notice 
that, a few lines below, in the "other return scenario" of the method (when 
something has been trimmed), there is a special treatment of the mapping for 
ANTI/SEMI, so things will work fine in this case:
{code:java}
    switch (join.getJoinType()) {
    case SEMI:
    case ANTI:
      // For SemiJoins and AntiJoins only map fields from the left-side
      if (join.getJoinType() == JoinRelType.SEMI) {
        relBuilder.semiJoin(newConditionExpr);
      } else {
        relBuilder.antiJoin(newConditionExpr);
      }
      Mapping inputMapping = inputMappings.get(0);
      mapping = Mappings.create(MappingType.INVERSE_SURJECTION,
          join.getRowType().getFieldCount(),
          newSystemFieldCount + inputMapping.getTargetCount());
      for (int i = 0; i < newSystemFieldCount; ++i) {
        mapping.set(i, i);
      }
      offset = systemFieldCount;
      newOffset = newSystemFieldCount;
      for (IntPair pair : inputMapping) {
        mapping.set(pair.source + offset, pair.target + newOffset);
      }
      break;
    default:
      relBuilder.join(join.getJoinType(), newConditionExpr);
    }
    relBuilder.hints(join.getHints());
    return result(relBuilder.build(), mapping);
{code}


> AssertionError caused by RelFieldTrimmer on SEMI/ANTI join
> ----------------------------------------------------------
>
>                 Key: CALCITE-4995
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4995
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.29.0
>            Reporter: Ruben Q L
>            Assignee: Ruben Q L
>            Priority: Major
>
> It seems {{RelFieldTrimmer}} can cause an {{AssertionError}} (or later an 
> {{ArrayIndexOutOfBoundsException}} if assertions are disabled) on certain 
> plans involving SEMI/ANTI join (i.e. joins that do NOT project the RHS 
> fields).
> The root cause seems to be the "early return" in 
> {{RelFieldTrimmer#trimFields(Join join, ImmutableBitSet fieldsUsed, 
> Set<RelDataTypeField> extraFields)}} when nothing has been trimmed inside 
> join's inputs (so the join itself can be return as it is):
> {code:java}
>     if (changeCount == 0
>         && mapping.isIdentity()) {
>       return result(join, Mappings.createIdentity(fieldCount));
>     }
> {code}
> The problem is that this {{fieldCount}} is an addition of LHS + RHS fields (+ 
> system fields); but in case of a SEMI/ANTI the mappings to be returned must 
> not consider RHS fields (since they are not projected by these join types).
> The problem only happens here (when the trimmer does not trim the join). 
> Notice that, a few lines below, in the "other return scenario" of the method 
> (when something has been trimmed), there is a special treatment of the 
> mapping for ANTI/SEMI, so things will work fine in this case:
> {code:java}
>     switch (join.getJoinType()) {
>     case SEMI:
>     case ANTI:
>       // For SemiJoins and AntiJoins only map fields from the left-side
>       if (join.getJoinType() == JoinRelType.SEMI) {
>         relBuilder.semiJoin(newConditionExpr);
>       } else {
>         relBuilder.antiJoin(newConditionExpr);
>       }
>       Mapping inputMapping = inputMappings.get(0);
>       mapping = Mappings.create(MappingType.INVERSE_SURJECTION,
>           join.getRowType().getFieldCount(),
>           newSystemFieldCount + inputMapping.getTargetCount());
>       for (int i = 0; i < newSystemFieldCount; ++i) {
>         mapping.set(i, i);
>       }
>       offset = systemFieldCount;
>       newOffset = newSystemFieldCount;
>       for (IntPair pair : inputMapping) {
>         mapping.set(pair.source + offset, pair.target + newOffset);
>       }
>       break;
>     default:
>       relBuilder.join(join.getJoinType(), newConditionExpr);
>     }
>     relBuilder.hints(join.getHints());
>     return result(relBuilder.build(), mapping);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to