[
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-20419:
---------------------------
Description:
This is going into the loop because the VectorPartitionDesc is modified after
it is used in the HashMap key - resulting in a hashcode & equals modification
after it has been placed in the hashmap.
{code}
HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample:
621ms
java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7
recursive calls>
java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object,
Object) HashMap.java:1989
java.util.HashMap.putVal(int, Object, Object, boolean, boolean) HashMap.java:637
java.util.HashMap.put(Object, Object) HashMap.java:611
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
VectorPartitionDesc, Map) Vectorizer.java:1272
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
String, TableScanOperator, Vectorizer$VectorTaskColumnInfo)
Vectorizer.java:1654
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
boolean) Vectorizer.java:1109
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
Stack, Object[]) Vectorizer.java:961
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack,
TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node)
TaskGraphWalker.java:180
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, HashMap)
TaskGraphWalker.java:125
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
Vectorizer.java:2442
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List,
ParseContext, Context) TezCompiler.java:717
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List,
HashSet, HashSet) TaskCompiler.java:258
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode,
SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode)
CalcitePlanner.java:358
{code}
was:
With ACID table, the format and schema layouts are much more strictly
controlled - the table cannot be made of partial ORC and partial RCFile.
This assumption can remove this loop and the slow check for schema between each
partition before vectorizing the operators - the worst-case performance is the
common & correct case, where all of them match.
{code}
HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample:
621ms
java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7
recursive calls>
java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object,
Object) HashMap.java:1989
java.util.HashMap.putVal(int, Object, Object, boolean, boolean) HashMap.java:637
java.util.HashMap.put(Object, Object) HashMap.java:611
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
VectorPartitionDesc, Map) Vectorizer.java:1272
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
String, TableScanOperator, Vectorizer$VectorTaskColumnInfo)
Vectorizer.java:1654
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
boolean) Vectorizer.java:1109
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
Stack, Object[]) Vectorizer.java:961
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack,
TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node)
TaskGraphWalker.java:180
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, HashMap)
TaskGraphWalker.java:125
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
Vectorizer.java:2442
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List,
ParseContext, Context) TezCompiler.java:717
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List,
HashSet, HashSet) TaskCompiler.java:258
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode,
SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode)
CalcitePlanner.java:358
{code}
> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a
> hashmap key
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Gopal V
> Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after
> it is used in the HashMap key - resulting in a hashcode & equals modification
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample:
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object,
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean)
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
> VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
> boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
> String, TableScanOperator, Vectorizer$VectorTaskColumnInfo)
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
> Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
> boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
> Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack,
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node)
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection,
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
> Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List,
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List,
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode,
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode)
> CalcitePlanner.java:358
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)