Re: RelNode#getDescription and memory consumption

Julian Hyde Wed, 15 Aug 2018 16:56:05 -0700

I see now.

I think the problem only occurs when you call AbstractRelNode.recomputeDigest().


The first time the digest is computed, the input RelNodes have a digest (and 
desc) as it has been set in AbstractRelNode’s constructor: 

  this.digest = getRelTypeName() + "#" + id;
  this.desc = digest;

Explain writer uses the “desc” field to identify inputs, but maybe it should 
use id or type + id. Or maybe the “desc” field should be final.

By the way, the comment

  // Substring uses the same underlying array of chars, so saves a bit
  // of memory.

was true until JDK 1.6 but is no longer true.

Can you log a JIRA case please.

Julian



> On Aug 15, 2018, at 2:37 PM, Laurent Goujon <[email protected]> wrote:
> 
> Sorry, I should have mentioned the method too: HepPlanner#buildFinalPlan
> (when running RelOptRulesTest#testWindowInParenthesis())
> 
> On Wed, Aug 15, 2018 at 2:36 PM Laurent Goujon <[email protected]> wrote:
> 
>> It looks to happen when building the final plan: the hep planner goes
>> recursively to each node to recompute the digest. In that relnode tree,
>> there's no more HepRelVertex nodes, and the digest now includes the whole
>> input(s) description.
>> 
>> On Wed, Aug 15, 2018 at 2:33 PM Julian Hyde <[email protected]> wrote:
>> 
>>> When I run that test I get
>>> 
>>> LogicalProject(input=HepRelVertex#10,$0=$9)
>>> 
>>> Have you screwed something up?
>>> 
>>>> On Aug 15, 2018, at 2:23 PM, Laurent Goujon <[email protected]> wrote:
>>>> 
>>>> Just ran RelOptRulesTest with a breakpoint in
>>>> AbstractRelNode#computeDigest() and I'm able to observe those kind of
>>>> digest:
>>>> 
>>> "LogicalProject(input=rel#6:LogicalWindow(input=rel#0:LogicalTableScan(table=[CATALOG,
>>>> SALES, EMP]),window#0=window(partition {0} order by [0] range between
>>>> UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])),$0=$9)"
>>>> 
>>>> On Wed, Aug 15, 2018 at 2:09 PM Laurent Goujon <[email protected]>
>>> wrote:
>>>> 
>>>>> Here's one (partial) example (truncated because it contains potential
>>>>> sensitive info, and didn't obfuscate or try to reproduce locally with
>>> non
>>>>> sensitive data):
>>>>> 
>>>>> 
>>> "rel#8643738:LogicalProject.NONE.ANY([]).[](input=rel#8643736:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643702:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643668:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643634:LogicalProject.NONE.ANY([]).[](input=rel#8643632:LogicalAggregate.NONE.ANY([]).[](input=rel#8643630:LogicalAggregate.NONE.ANY([]).[](input=rel#8643628:LogicalProject.NONE.ANY([]).[](input=rel#8643626:LogicalFilter.NONE.ANY([]).[](input=rel#8643624:LogicalProject.NONE.ANY([]).[](input=rel#8643622:LogicalProject.NONE.ANY([]).[](input=rel#8643842:MultiJoin.NONE.ANY([]).[](input#0=rel#8643838:LogicalProject.NONE.ANY([]).[](input=rel#8643615:MultiJoin.NONE.ANY([]).[](input#0=rel#8643603:LogicalProject.NONE.ANY([]).[](input=rel#8643601:SampleCrel.NONE.ANY([]).[](input=rel#8639853:ScanCrel.NONE.ANY([]).[](table="...
>>>>> 
>>>>> The Logical* relnodes don't override computeDigest method, so this is
>>>>> basically whatever AbstractRelNode#computeDigest is doing:
>>>>> 
>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L415
>>>>> 
>>>>> Laurent
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Aug 15, 2018 at 1:57 PM Julian Hyde <[email protected]> wrote:
>>>>> 
>>>>>> I thought the digest only included the IDs of the inputs, not the
>>> digest
>>>>>> of the inputs. Am I mistaken?
>>>>>> 
>>>>>> Could you give an example of large description & digest?
>>>>>> 
>>>>>>> On Aug 15, 2018, at 1:46 PM, Laurent Goujon <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> I'm looking for some guidance here before opening JIRAs/pull
>>> requests.
>>>>>>> 
>>>>>>> I'm examining a memory dump during a planning operation and a
>>>>>> significant
>>>>>>> amount of memory are strings used for RelNode digest and description
>>>>>> (some
>>>>>>> strings being around 130kb). In that particular case, the relnode
>>> tree
>>>>>> is
>>>>>>> particularly deep, and since the digest is basically done
>>> recursively,
>>>>>> the
>>>>>>> deepest/widest the tree, the longer the digest.
>>>>>>> 
>>>>>>> The easy solution would be to not go deep when adding inputs to the
>>>>>> digest,
>>>>>>> and instead of adding the input description to only add their type,
>>> id
>>>>>> and
>>>>>>> traits (and also not recurse). Would this break parts of calcite, or
>>>>>> cause
>>>>>>> other inconvenience because some use-cases rely on digest/description
>>>>>> to be
>>>>>>> basically the whole tree in a textual form?
>>>>>>> 
>>>>>>> Laurent
>>>>>> 
>>>>>> 
>>> 
>>>

Re: RelNode#getDescription and memory consumption

Reply via email to