[ https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764266#comment-16764266 ]
Philip Zeyliger commented on IMPALA-6590: ----------------------------------------- For purposes of reproduction, the following shows how not linear we are in number of columns in the VALUES statement: {code} $for i in 256 512 1024 2048 4096 8192 16384 32768; do (echo 'VALUES ('; for x in $(seq $i); do echo "cast($x as string),"; done; echo "NULL); profile;") | time impala-shell.sh -f /dev/stdin |& grep Analysis; done - Analysis finished: 35.027ms (34.359ms) - Analysis finished: 76.808ms (75.678ms) - Analysis finished: 188.936ms (186.829ms) - Analysis finished: 499.325ms (494.968ms) - Analysis finished: 1s606ms (1s598ms) - Analysis finished: 6s663ms (6s647ms) - Analysis finished: 29s844ms (29s812ms) - Analysis finished: 2m37s (2m37s) {code} My ad-hoc jstacking suggests that there's an issue below as well as calling into the native code (serially, thereby encountering possibly a lot of JNI overhead). Looking the source, SelectStmt.java:291 is in a loop for every expression in the statement, and it ends up inserting it into a List. So, the number of {{equals()}} calls is quadratic. {code} "Thread-50" #70 prio=5 os_prio=0 tid=0x000000000b471000 nid=0x10cc runnable [0x00007ff90190a000] java.lang.Thread.State: RUNNABLE at org.apache.impala.analysis.SlotRef.localEquals(SlotRef.java:193) at org.apache.impala.analysis.SlotRef$1.matches(SlotRef.java:206) at org.apache.impala.analysis.Expr.matches(Expr.java:841) at org.apache.impala.analysis.Expr.equals(Expr.java:865) at org.apache.impala.analysis.ExprSubstitutionMap.get(ExprSubstitutionMap.java:67) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyzeSelectClause(SelectStmt.java:291) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:223) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:207) at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:200) at org.apache.impala.analysis.UnionStmt$UnionOperand.analyze(UnionStmt.java:88) at org.apache.impala.analysis.UnionStmt.analyzeOperands(UnionStmt.java:280) at org.apache.impala.analysis.UnionStmt.analyze(UnionStmt.java:219) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:448) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:418) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1282) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1249) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1219) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:168) {code} > Disable expr rewrites and codegen for VALUES() statements > --------------------------------------------------------- > > Key: IMPALA-6590 > URL: https://issues.apache.org/jira/browse/IMPALA-6590 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0 > Reporter: Alexander Behm > Priority: Major > Labels: perf, planner, ramp-up, regression > > The analysis of statements with big VALUES clauses like INSERT INTO <tbl> > VALUES is slow due to expression rewrites like constant folding. The > performance of such statements has regressed since the introduction of expr > rewrites and constant folding in IMPALA-1788. > We should skip expr rewrites for VALUES altogether since it mostly provides > no benefit but can have a large overhead due to evaluation of expressions in > the backend (constant folding). These expressions are ultimately evaluated > and materialized in the backend anyway, so there's no point in folding them > during analysis. > Similarly, there is no point in doing codegen for these exprs in the backend > union node. > *Workaround* > {code} > SET ENABLE_EXPR_REWRITES=FALSE; > SET DISABLE_CODEGEN=TRUE; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org