Yu Tian created CALCITE-5842:
--------------------------------
Summary: LogicalProject deepHashCode creates duplicate value with
different RowType
Key: CALCITE-5842
URL: https://issues.apache.org/jira/browse/CALCITE-5842
Project: Calcite
Issue Type: Bug
Affects Versions: 1.32.0
Reporter: Yu Tian
The LogicalProject class has deepEquals0 and deepHashCode0 methods, in the
deepEquals0 method, it consider getRowType() as one equal standard, however, in
the deepHashCode0, it is missing the getRowType() to generated the hash value.
Do we do this on purpose or it is a bug?
[https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Project.java#L348,L368]
The reason we ask is that we are trying 2 use cases from our side.
The first one is two LogicalTableScan with similar configurations, which are
connected to 2 separate LogicalFiler, then we LogicalJoin these 2 together. One
issue we noticed is that, in HepPlanner, it has logics as below
package org.apache.calcite.plan.hep.HepPlanner
{code:java}
// try to find equivalent rel only if DAG is allowed
if (!noDag) {
// Now, check if an equivalent vertex already exists in graph.
HepRelVertex equivVertex = mapDigestToVertex.get(rel.getRelDigest());
if (equivVertex != null) {
// Use existing vertex.
return equivVertex;
}
} {code}
The 2 logicalProjects from the 2 LogicalTableScans have same hashCode value
based on the deepHashCode method in LogicalProject, because it didn’t consider
the getRowType() value, the planner is replacing LogicalTableScan2 with
LogicalTableScan1, in fact, we should treat them as separate items to process.
Another use case we have, we have 2 diagrams, each diagram with
LogicalTableScan, LogicalFiler, LogicalTableModify, LogicalTableScan have
similar setup with different rowType information. This time, HepPlanner is
passing, since it has separate HepPlanner stage, so above issue is not
happening. However, when it reach the VolcanoPlanner, the logics
package org.apache.calcite.plan.volcano.VolcanoPlanner
{code:java}
// If it is equivalent to an existing expression, return the set that
// the equivalent expression belongs to.
RelDigest digest = rel.getRelDigest();
RelNode equivExp = mapDigestToRel.get(digest); {code}
The map replace the LogicalTableScan1 with LogicalTableScan2 in the
LogicalProject stage since they have same hashCode, and the map is reusing
earlier processed RelNode, which caused the issues.
Here are the proposals we have,
* Narrow Scope change: LogicalProject is the most frequently used project
type, we only change it.
** Modify the LogicalProject method deepHashCode method to use
{code:java}
@Override public int deepHashCode() {
return Objects.hash(traitSet, input.deepHashCode(), exps, hints,
getRowType());
}{code}
Consider the getRowType() value in the hash generation will resolve the issue,
since the rowType contains the field names and data types information.
* Whole Scope change: Change the deepHashCode method in Project class.
** Similar change as above, however, the scope of this change is wide compared
to the first one.
Is it something we can consider to improve in the following release of Apache
Calcite?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)