[jira] [Created] (CALCITE-6468) RelDecorrelator throws AssertionError if correlated variable is used as Aggregate group key

Ruben Q L (Jira) Fri, 12 Jul 2024 10:19:33 -0700

Ruben Q L created CALCITE-6468:
----------------------------------

             Summary: RelDecorrelator throws AssertionError if correlated 
variable is used as Aggregate group key
                 Key: CALCITE-6468
                 URL: https://issues.apache.org/jira/browse/CALCITE-6468
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.37.0
            Reporter: Ruben Q L
            Assignee: Ruben Q L
             Fix For: 1.38.0



The problem can be reproduced with this query (a "simplified" version of TPC-DS 
query1):
{code:sql}
WITH agg_sal AS
  (SELECT deptno, sum(sal) AS total FROM emp GROUP BY deptno)
SELECT 1 FROM agg_sal s1
WHERE s1.total > (SELECT avg(total) FROM agg_sal s2 WHERE s1.deptno = s2.deptno)
{code}

If we apply subquery program, FilterAggregateTransposeRule and then we call the 
RelDecorrelator, it will fail with:
{noformat}
java.lang.AssertionError
        at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:581)
        at 
org.apache.calcite.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:495)
        ...
{noformat}

The problem appears in this assert (RelDecorrelator.java:581):
{code}
assert newPos == newInputOutput.size();
{code}

The root cause seems to be that, a few lines before, when processing the 
correlating variables from {{corDefOutputs}} a certain value is inserted in 
{{mapNewInputToProjOutputs}}:
{code}
if (!frame.corDefOutputs.isEmpty()) {
  for (Map.Entry<CorDef, Integer> entry : frame.corDefOutputs.entrySet()) {
    RexInputRef.add2(projects, entry.getValue(), newInputOutput);
    corDefOutputs.put(entry.getKey(), newPos);
    mapNewInputToProjOutputs.put(entry.getValue(), newPos); // <-- HERE
    newPos++;
  }
}
{code}

The problem is that this value was already in the map, as it had been inserted 
previously as part of the group key processing:
{code}
for (int i = 0; i < oldGroupKeyCount; i++) {
  final int idx = groupKeyIndices.get(i);
  ...
  // add mapping of group keys.
  outputMap.put(idx, newPos);
  int newInputPos = requireNonNull(frame.oldToNewOutputs.get(idx));
  RexInputRef.add2(projects, newInputPos, newInputOutput);
  mapNewInputToProjOutputs.put(newInputPos, newPos); // <-- HERE added firstly
  newPos++;
}
{code}

Therefore, the unnecessary insertion into {{mapNewInputToProjOutputs}} and the 
subsequent increment of {{newPos}} when the {{CorDef}}s are processed leads to 
the mismatch.

Notice how, right before the assertion, when processing the remaining fields, 
it is verified that the value is not already contained on the 
{{mapNewInputToProjOutputs}}:
{code}
// add the remaining fields
final int newGroupKeyCount = newPos;
for (int i = 0; i < newInputOutput.size(); i++) {
  if (!mapNewInputToProjOutputs.containsKey(i)) { // <-- HERE checked
    RexInputRef.add2(projects, i, newInputOutput);
    mapNewInputToProjOutputs.put(i, newPos);
    newPos++;
  }
}
{code}

Thus, probably the solution would be to apply the same logic when the CorDef 
are processed:
{code}
if (!frame.corDefOutputs.isEmpty()) {
  for (Map.Entry<CorDef, Integer> entry : frame.corDefOutputs.entrySet()) {
    Integer pos = mapNewInputToProjOutputs.get(entry.getValue());
    if (pos == null) {
      RexInputRef.add2(projects, entry.getValue(), newInputOutput);
      corDefOutputs.put(entry.getKey(), newPos);
      mapNewInputToProjOutputs.put(entry.getValue(), newPos);
      newPos++;
    } else {
      corDefOutputs.put(entry.getKey(), pos);
    }
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-6468) RelDecorrelator throws AssertionError if correlated variable is used as Aggregate group key

Reply via email to