[
https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764266#comment-16764266
]
Philip Zeyliger commented on IMPALA-6590:
-----------------------------------------
For purposes of reproduction, the following shows how not linear we are in
number of columns in the VALUES statement:
{code}
$for i in 256 512 1024 2048 4096 8192 16384 32768; do (echo 'VALUES ('; for x
in $(seq $i); do echo "cast($x as string),"; done; echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done
- Analysis finished: 35.027ms (34.359ms)
- Analysis finished: 76.808ms (75.678ms)
- Analysis finished: 188.936ms
(186.829ms)
- Analysis finished:
499.325ms (494.968ms)
- Analysis
finished: 1s606ms (1s598ms)
-
Analysis finished: 6s663ms (6s647ms)
- Analysis finished: 29s844ms (29s812ms)
- Analysis finished: 2m37s (2m37s)
{code}
My ad-hoc jstacking suggests that there's an issue below as well as calling
into the native code (serially, thereby encountering possibly a lot of JNI
overhead). Looking the source, SelectStmt.java:291 is in a loop for every
expression in the statement, and it ends up inserting it into a List. So, the
number of {{equals()}} calls is quadratic.
{code}
"Thread-50" #70 prio=5 os_prio=0 tid=0x000000000b471000 nid=0x10cc runnable
[0x00007ff90190a000]
java.lang.Thread.State: RUNNABLE
at org.apache.impala.analysis.SlotRef.localEquals(SlotRef.java:193)
at org.apache.impala.analysis.SlotRef$1.matches(SlotRef.java:206)
at org.apache.impala.analysis.Expr.matches(Expr.java:841)
at org.apache.impala.analysis.Expr.equals(Expr.java:865)
at
org.apache.impala.analysis.ExprSubstitutionMap.get(ExprSubstitutionMap.java:67)
at
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyzeSelectClause(SelectStmt.java:291)
at
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:223)
at
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:207)
at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:200)
at
org.apache.impala.analysis.UnionStmt$UnionOperand.analyze(UnionStmt.java:88)
at
org.apache.impala.analysis.UnionStmt.analyzeOperands(UnionStmt.java:280)
at org.apache.impala.analysis.UnionStmt.analyze(UnionStmt.java:219)
at
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:448)
at
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:418)
at
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1282)
at
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1249)
at
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1219)
at
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:168)
{code}
> Disable expr rewrites and codegen for VALUES() statements
> ---------------------------------------------------------
>
> Key: IMPALA-6590
> URL: https://issues.apache.org/jira/browse/IMPALA-6590
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
> Reporter: Alexander Behm
> Priority: Major
> Labels: perf, planner, ramp-up, regression
>
> The analysis of statements with big VALUES clauses like INSERT INTO <tbl>
> VALUES is slow due to expression rewrites like constant folding. The
> performance of such statements has regressed since the introduction of expr
> rewrites and constant folding in IMPALA-1788.
> We should skip expr rewrites for VALUES altogether since it mostly provides
> no benefit but can have a large overhead due to evaluation of expressions in
> the backend (constant folding). These expressions are ultimately evaluated
> and materialized in the backend anyway, so there's no point in folding them
> during analysis.
> Similarly, there is no point in doing codegen for these exprs in the backend
> union node.
> *Workaround*
> {code}
> SET ENABLE_EXPR_REWRITES=FALSE;
> SET DISABLE_CODEGEN=TRUE;
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]