[
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-20419:
---------------------------
Summary: Vectorization: Prevent mutation of VectorPartitionDesc after being
used in a hashmap key (was: Vectorization: Speed up
VectorizationDispatcher.validateInputFormatAndSchemaEvolution() for ACIDv2)
> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a
> hashmap key
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Gopal V
> Priority: Major
>
> With ACID table, the format and schema layouts are much more strictly
> controlled - the table cannot be made of partial ORC and partial RCFile.
> This assumption can remove this loop and the slow check for schema between
> each partition before vectorizing the operators - the worst-case performance
> is the common & correct case, where all of them match.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample:
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object,
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean)
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
> VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
> boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
> String, TableScanOperator, Vectorizer$VectorTaskColumnInfo)
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
> Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
> boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
> Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack,
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node)
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection,
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
> Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List,
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List,
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode,
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode)
> CalcitePlanner.java:358
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)