[ 
https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764266#comment-16764266
 ] 

Philip Zeyliger commented on IMPALA-6590:
-----------------------------------------

For purposes of reproduction, the following shows how not linear we are in 
number of columns in the VALUES statement:
{code}
$for i in 256 512 1024 2048 4096 8192 16384 32768; do (echo 'VALUES ('; for x 
in $(seq $i); do echo  "cast($x as string),"; done; echo "NULL); profile;") | 
time impala-shell.sh -f /dev/stdin |& grep Analysis; done                       
                                                                                
                           - Analysis finished: 35.027ms (34.359ms)             
                                                                                
                                   - Analysis finished: 76.808ms (75.678ms)     
                                                                                
                                           - Analysis finished: 188.936ms 
(186.829ms)                                                                     
                                                         - Analysis finished: 
499.325ms (494.968ms)                                                           
                                                                   - Analysis 
finished: 1s606ms (1s598ms)                                                     
                                                                             - 
Analysis finished: 6s663ms (6s647ms)                                            
                                                                                
      - Analysis finished: 29s844ms (29s812ms)
- Analysis finished: 2m37s (2m37s)
{code}

My ad-hoc jstacking suggests that there's an issue below as well as calling 
into the native code (serially, thereby encountering possibly a lot of JNI 
overhead). Looking the source, SelectStmt.java:291 is in a loop for every 
expression in the statement, and it ends up inserting it into a List. So, the 
number of {{equals()}} calls is quadratic.

{code}
"Thread-50" #70 prio=5 os_prio=0 tid=0x000000000b471000 nid=0x10cc runnable 
[0x00007ff90190a000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.impala.analysis.SlotRef.localEquals(SlotRef.java:193)
        at org.apache.impala.analysis.SlotRef$1.matches(SlotRef.java:206)
        at org.apache.impala.analysis.Expr.matches(Expr.java:841)
        at org.apache.impala.analysis.Expr.equals(Expr.java:865)
        at 
org.apache.impala.analysis.ExprSubstitutionMap.get(ExprSubstitutionMap.java:67)
        at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyzeSelectClause(SelectStmt.java:291)
        at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:223)
        at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:207)
        at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:200)
        at 
org.apache.impala.analysis.UnionStmt$UnionOperand.analyze(UnionStmt.java:88)
        at 
org.apache.impala.analysis.UnionStmt.analyzeOperands(UnionStmt.java:280)
        at org.apache.impala.analysis.UnionStmt.analyze(UnionStmt.java:219)
        at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:448)
        at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:418)
        at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1282)
        at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1249)
        at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1219)
        at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:168)
{code}

> Disable expr rewrites and codegen for VALUES() statements
> ---------------------------------------------------------
>
>                 Key: IMPALA-6590
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6590
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Alexander Behm
>            Priority: Major
>              Labels: perf, planner, ramp-up, regression
>
> The analysis of statements with big VALUES clauses like INSERT INTO <tbl> 
> VALUES is slow due to expression rewrites like constant folding. The 
> performance of such statements has regressed since the introduction of expr 
> rewrites and constant folding in IMPALA-1788.
> We should skip expr rewrites for VALUES altogether since it mostly provides 
> no benefit but can have a large overhead due to evaluation of expressions in 
> the backend (constant folding). These expressions are ultimately evaluated 
> and materialized in the backend anyway, so there's no point in folding them 
> during analysis.
> Similarly, there is no point in doing codegen for these exprs in the backend 
> union node.
> *Workaround*
> {code}
> SET ENABLE_EXPR_REWRITES=FALSE;
> SET DISABLE_CODEGEN=TRUE;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to