[ 
https://issues.apache.org/jira/browse/CALCITE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138997#comment-17138997
 ] 

Danny Chen commented on CALCITE-4056:
-------------------------------------

After CALCITE-3786, the description in the JIRA issue is not valid anymore, we 
have removed the string digest for RexNode and RelNode, and the Benchmark data 
is also impressive.

The only diff of this patch is that we kept another RelDigest with hashCode 
cached but without the RelNode properties.
Just like what i said in CALCITE-3786, In order to avoid the duplicate object 
references, you have to implement #hash and #equals for all RelNodes (including 
logical and physical), keep 3 methods #explainTerms #equals #hashCode include 
the correct variables is error prone. This is also a breaking change for 
downstream projects, i can imagine they *have to* implement the 2 more 
interfaces for each node correctly.

The object references should not be a bottleneck and the benchmark of this 
patch should expect to be almost the same with before. Based on that, i prefer 
less interfaces and more concise code.

> Remove Digest from RelNode and RexNode
> --------------------------------------
>
>                 Key: CALCITE-4056
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4056
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Haisheng Yuan
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The digest is used everywhere (RelNode, RexNode, DataType), causing OOM 
> easily for large queries or with complex expressions. Datatype is cached in 
> global interner and can be reused. Unlike RelNode, RexNode is not stored in 
> MEMO as a GROUP, causing it can't be shared. This makes Calcite can't scale 
> for large queries, e.g. CALCITE-3784.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to