[jira] [Commented] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions

Stamatis Zampetakis (Jira) Thu, 16 May 2024 07:35:18 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846978#comment-17846978
 ]


Stamatis Zampetakis commented on HIVE-28264:
--------------------------------------------

To understand the problem let's consider a much simpler variation of the query 
in the description.

{code:sql}
SELECT x1 from
    (SELECT concat_ws('L1',x0, x0) as x1 from
        (SELECT concat_ws('L0',title, title) as x0 from t0) t1) t2;
{code}

It is easy to see that the SELECT clauses can be merged together leading to the 
following query.
{code:sql}
SELECT concat_ws('L1',concat_ws('L0',title, title), concat_ws('L0',title, 
title)) as x1 from t0;
{code}
The two queries are equivalent, however they don't contain the same number of 
{{concat_ws}} calls. The first contains two calls while the second contains 
three calls and the expression in the SELECT clause is bigger than both of the 
previous expressions.

When the query has nested function calls (CONCAT or anything else) then merging 
those together leads to bigger expressions. In fact the growth rate of the 
expression is exponential to the number of its arguments. 

+Examples:+
When the CONCAT function has two arguments then for each nested level the 
expression grows by a factor of two. The size of the final expression is 
(1-2^L)/(1-2) where L is the levels of nesting.
When the CONCAT functions has four arguments (as the query in the description) 
the for each nested level the expression grows by a factor of four.  The size 
of the final expression is (1-4^L)/(1-4) where L is the levels of nesting.

There are various optimization rules (eg., HiveFieldTrimmerRule, 
HiveProjectMergeRule, etc.) that will try to merge expressions together and 
when this happens in an uncontrolled manner the resulting expression is 
exponentially big, which can further lead to OOM problems, very slow 
compilation,  etc. Clearly it is not always beneficial to merge expressions 
together and the aforementioned rules do have some logic in place to avoid this 
kind of huge expansion. Both rules pass from 
{{RelOptUtil#pushPastProjectUnlessBloat}} so they can be tuned via the bloat 
parameter.

However, there are also other rules that are affected by this exponential 
growth problem , such as {{HiveFilterProjectTransposeRule}}, and currently they 
do not have logic to prevent that.
+Before+
{code:sql}
SELECT x1 from
    (SELECT concat_ws('L1',x0, x0) as x1 from
        (SELECT concat_ws('L0',title, title) as x0 from t0) t1) t2
WHERE x1 = 'Something';
{code}
+After+
{code:sql}
SELECT x1 from
    (SELECT concat_ws('L1',x0, x0) as x1 from
        (SELECT concat_ws('L0',title, title) as x0 from t0
         WHERE concat_ws('L1',concat_ws('L0',title, title), 
concat_ws('L0',title, title)) = 'Something') t1) t2;
{code}
In this case the exponential growth happens when trying to push the filter down 
past the projections. A possible solution would be to improve 
HiveFilterProjectTransposeRule and other rules that may be affected to avoid 
creating overly complex expressions using a similar bloat configuration 
parameter.

> OOM/slow compilation when query contains SELECT clauses with nested 
> expressions
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-28264
>                 URL: https://issues.apache.org/jira/browse/HIVE-28264
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, HiveServer2
>    Affects Versions: 4.0.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> {code:sql}
> CREATE TABLE t0 (`title` string);
> SELECT x10 from
>     (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from
>         (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from
>             (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from
>                 (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from
>                     (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from
>                         (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from
>                             (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from
>                                 (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 
> from
>                                     (SELECT concat_ws('L2',x1, x1, x1, x1) as 
> x2 from
>                                         (SELECT concat_ws('L1',x0, x0, x0, 
> x0) as x1 from
>                                             (SELECT concat_ws('L0',title, 
> title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t
> WHERE x10 = 'Something';
> {code}
> The query above fails with OOM when run with the TestMiniLlapLocalCliDriver 
> and the default max heap size configuration effective for tests (-Xmx2048m).
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.Arrays.copyOf(Arrays.java:3332)
>       at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>       at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>       at java.lang.StringBuilder.append(StringBuilder.java:136)
>       at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152)
>       at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>       at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105)
>       at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151)
>       at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>       at java.lang.String.valueOf(String.java:2994)
>       at java.lang.StringBuilder.append(StringBuilder.java:131)
>       at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90)
>       at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>       at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>       at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122)
>       at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116)
>       at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>       at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>       at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308)
>       at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73)
>       at 
> org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68)
>       at 
> org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370)
>       at 
> org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702)
>       at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545)
>       at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>       at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>       at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>       at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>       at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2452)
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2411)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions

Reply via email to