[jira] [Created] (CALCITE-5842) LogicalProject deepHashCode creates duplicate value with different RowType

Yu Tian (Jira) Wed, 12 Jul 2023 14:45:10 -0700

Yu Tian created CALCITE-5842:
--------------------------------

             Summary: LogicalProject deepHashCode creates duplicate value with 
different RowType 
                 Key: CALCITE-5842
                 URL: https://issues.apache.org/jira/browse/CALCITE-5842
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.32.0
            Reporter: Yu Tian



The LogicalProject class has deepEquals0 and deepHashCode0 methods, in the 
deepEquals0 method, it consider getRowType() as one equal standard, however, in 
the deepHashCode0, it is missing the getRowType() to generated the hash value. 
Do we do this on purpose or it is a bug?

[https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Project.java#L348,L368]
 

 

The reason we ask is that we are trying 2 use cases from our side.

The first one is two LogicalTableScan with similar configurations, which are 
connected to 2 separate LogicalFiler, then we LogicalJoin these 2 together. One 
issue we noticed is that, in HepPlanner, it has logics as below

 
package org.apache.calcite.plan.hep.HepPlanner
{code:java}
// try to find equivalent rel only if DAG is allowed
if (!noDag) {
  // Now, check if an equivalent vertex already exists in graph.
  HepRelVertex equivVertex = mapDigestToVertex.get(rel.getRelDigest());
  if (equivVertex != null) {
    // Use existing vertex.
    return equivVertex;
  }
} {code}
 

The 2 logicalProjects from the 2 LogicalTableScans have same hashCode value 
based on the deepHashCode method in LogicalProject, because it didn’t consider 
the getRowType() value, the planner is replacing LogicalTableScan2 with 
LogicalTableScan1, in fact, we should treat them as separate items to process. 

 

Another use case we have, we have 2 diagrams, each diagram with 
LogicalTableScan, LogicalFiler, LogicalTableModify, LogicalTableScan have 
similar setup with different rowType information. This time, HepPlanner is 
passing, since it has separate HepPlanner stage, so above issue is not 
happening. However, when it reach the VolcanoPlanner, the logics

 
package org.apache.calcite.plan.volcano.VolcanoPlanner
{code:java}
// If it is equivalent to an existing expression, return the set that
// the equivalent expression belongs to.
RelDigest digest = rel.getRelDigest();
RelNode equivExp = mapDigestToRel.get(digest); {code}
 

The map replace the LogicalTableScan1 with LogicalTableScan2 in the 
LogicalProject stage since they have same hashCode, and the map is reusing 
earlier processed RelNode, which caused the issues.

 

Here are the proposals we have,

 
 * Narrow Scope change: LogicalProject is the most frequently used project 
type, we only change it.
 ** Modify the LogicalProject method deepHashCode method to use 

{code:java}
@Override public int deepHashCode() {
  return Objects.hash(traitSet, input.deepHashCode(), exps, hints, 
getRowType());
}{code}
Consider the getRowType() value in the hash generation will resolve the issue, 
since the rowType contains the field names and data types information. 

 
 * Whole Scope change: Change the deepHashCode method in Project class.
 ** Similar change as above, however, the scope of this change is wide compared 
to the first one.

 

Is it something we can consider to improve in the following release of Apache 
Calcite?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-5842) LogicalProject deepHashCode creates duplicate value with different RowType

Reply via email to