[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848462#comment-17848462 ]
ASF subversion and git services commented on IMPALA-12800: ---------------------------------------------------------- Commit ae6846b1cd039b2cd6f8753ce3ff810c5b2d3ce3 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ae6846b1c ] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds ExprSubstitutionMap::compose() and combine() call verify() to check the new ExprSubstitutionMap for duplicates. This algorithm is O(n^2) and can add significant overhead to SQLs with a large number of expressions or inline views. This changes verify() to skip the check for release builds (keeping it for debug builds). In a query with 20+ layers of inline views and thousands of expressions, turning off the verify() call cuts the execution time from 51 minutes to 18 minutes. This doesn't fully solve slowness in ExprSubstitutionMap. Further improvement would require Expr to support hash-based algorithms, which is a much larger change. Testing: - Manual performance comparison with/without the verify() call Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Reviewed-on: http://gerrit.cloudera.org:8080/21444 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > ------------------------------------------------------------------------------------- > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 4.3.0 > Reporter: Joe McDonnell > Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194) > at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275) > at > org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888) > at > org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181) > at > org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147) > at > org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org