[
https://issues.apache.org/jira/browse/CALCITE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138997#comment-17138997
]
Danny Chen commented on CALCITE-4056:
-------------------------------------
After CALCITE-3786, the description in the JIRA issue is not valid anymore, we
have removed the string digest for RexNode and RelNode, and the Benchmark data
is also impressive.
The only diff of this patch is that we kept another RelDigest with hashCode
cached but without the RelNode properties.
Just like what i said in CALCITE-3786, In order to avoid the duplicate object
references, you have to implement #hash and #equals for all RelNodes (including
logical and physical), keep 3 methods #explainTerms #equals #hashCode include
the correct variables is error prone. This is also a breaking change for
downstream projects, i can imagine they *have to* implement the 2 more
interfaces for each node correctly.
The object references should not be a bottleneck and the benchmark of this
patch should expect to be almost the same with before. Based on that, i prefer
less interfaces and more concise code.
> Remove Digest from RelNode and RexNode
> --------------------------------------
>
> Key: CALCITE-4056
> URL: https://issues.apache.org/jira/browse/CALCITE-4056
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Haisheng Yuan
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The digest is used everywhere (RelNode, RexNode, DataType), causing OOM
> easily for large queries or with complex expressions. Datatype is cached in
> global interner and can be reused. Unlike RelNode, RexNode is not stored in
> MEMO as a GROUP, causing it can't be shared. This makes Calcite can't scale
> for large queries, e.g. CALCITE-3784.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)