[
https://issues.apache.org/jira/browse/HIVE-26653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028824#comment-18028824
]
Soumyakanti Das edited comment on HIVE-26653 at 10/9/25 4:57 PM:
-----------------------------------------------------------------
For some reason EXPLAIN VECTORIZED doesn't work - it's not present in the
HiveParser.g as well, but I do see some tests that uses it. I will look into it
further, but I was able to run EXPLAIN VECTORIZATION DETAIL
Old plan:
{noformat}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE
DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE), Reducer 5
(BROADCAST_EDGE)
Reducer 3 <- Map 1 (SIMPLE_EDGE)
Reducer 5 <- Map 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: table_a
Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 20220731)
-> 4:string
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: '20220731' (type: string)
minReductionHashAggr: 0.96774197
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: aid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [string]
Map 4
Map Operator Tree:
TableScan
alias: table_b
Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 20220731)
-> 4:string
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: '20220731' (type: string)
minReductionHashAggr: 0.9655172
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: bid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [string]
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
Map Join Vectorization:
bigTableKeyColumns: 0:string
bigTableRetainColumnNums: []
className: VectorMapJoinInnerStringOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable IS
true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
projectedOutput: 1:string
smallTableValueMapping: 1:string
hashTableImplementationType: OPTIMIZED
outputColumnNames: _col1
input vertices:
1 Reducer 5
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
Map Join Vectorization:
bigTableKeyColumns: 1:string
bigTableRetainColumnNums: [2]
bigTableValueColumns: 2:string
bigTableValueExpressions: ConstantVectorExpression(val
20220731) -> 2:string
className: VectorMapJoinInnerBigOnlyStringOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
projectedOutput: 2:string
hashTableImplementationType: OPTIMIZED
outputColumnNames: _col1
input vertices:
1 Reducer 3
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: _col1 (type: string)
outputColumnNames: _col0
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: [2]
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
File Sink Vectorization:
className: VectorFileSinkOperator
native: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 3
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 5
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink{noformat}
New plan:
{noformat}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE
DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 4 <- Map 3 (SIMPLE_EDGE)
Reducer 5 <- Map 3 (SIMPLE_EDGE), Reducer 2 (BROADCAST_EDGE), Reducer 4
(BROADCAST_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: table_b
filterExpr: (p_dt = '20220731') (type: boolean)
Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 29 Data size: 319 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 1) ->
4:boolean
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: true (type: boolean)
minReductionHashAggr: 0.9655172
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: bid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [bigint]
Map 3
Map Operator Tree:
TableScan
alias: table_a
filterExpr: (p_dt = '20220731') (type: boolean)
Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 31 Data size: 341 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 1) ->
4:boolean
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: true (type: boolean)
minReductionHashAggr: 0.96774197
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: aid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [bigint]
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Reduce Sink Vectorization:
className: VectorReduceSinkEmptyKeyOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 4
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Reduce Sink Vectorization:
className: VectorReduceSinkEmptyKeyOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 5
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
Map Join Vectorization:
bigTableRetainColumnNums: []
className: VectorMapJoinInnerBigOnlyMultiKeyOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
hashTableImplementationType: OPTIMIZED
input vertices:
0 Reducer 2
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
Map Join Vectorization:
bigTableRetainColumnNums: []
className: VectorMapJoinInnerBigOnlyMultiKeyOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
hashTableImplementationType: OPTIMIZED
input vertices:
1 Reducer 4
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: '20220731' (type: string)
outputColumnNames: _col0
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: [1]
selectExpressions: ConstantVectorExpression(val
20220731) -> 1:string
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
File Sink Vectorization:
className: VectorFileSinkOperator
native: false
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink{noformat}
It looks like in the old plan we are using VectorMapJoinInnerStringOperator and
VectorMapJoinInnerBigOnlyStringOperator, while in the new plan we only use
VectorMapJoinInnerBigOnlyMultiKeyOperator
was (Author: soumyakanti.das):
For some reason EXPLAIN VECTORIZED doesn't work - it's not present in the
HiveParser.g as well, but I do see some tests that uses it. I will look into it
further, but I was able to run EXPLAIN VECTORIZATION DETAIL
Old plan:
{noformat}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE
DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE), Reducer 5
(BROADCAST_EDGE)
Reducer 3 <- Map 1 (SIMPLE_EDGE)
Reducer 5 <- Map 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: table_a
Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 20220731)
-> 4:string
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: '20220731' (type: string)
minReductionHashAggr: 0.96774197
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: aid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [string]
Map 4
Map Operator Tree:
TableScan
alias: table_b
Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 20220731)
-> 4:string
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: '20220731' (type: string)
minReductionHashAggr: 0.9655172
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: bid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [string]
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
Map Join Vectorization:
bigTableKeyColumns: 0:string
bigTableRetainColumnNums: []
className: VectorMapJoinInnerStringOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable IS
true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
projectedOutput: 1:string
smallTableValueMapping: 1:string
hashTableImplementationType: OPTIMIZED
outputColumnNames: _col1
input vertices:
1 Reducer 5
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
Map Join Vectorization:
bigTableKeyColumns: 1:string
bigTableRetainColumnNums: [2]
bigTableValueColumns: 2:string
bigTableValueExpressions: ConstantVectorExpression(val
20220731) -> 2:string
className: VectorMapJoinInnerBigOnlyStringOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
projectedOutput: 2:string
hashTableImplementationType: OPTIMIZED
outputColumnNames: _col1
input vertices:
1 Reducer 3
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: _col1 (type: string)
outputColumnNames: _col0
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: [2]
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
File Sink Vectorization:
className: VectorFileSinkOperator
native: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 3
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 5
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:string
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:string
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
keyColumns: 0:string
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE
Column stats: COMPLETE Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink{noformat}
New plan:
{noformat}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE
DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 4 <- Map 3 (SIMPLE_EDGE)
Reducer 5 <- Map 3 (SIMPLE_EDGE), Reducer 2 (BROADCAST_EDGE), Reducer 4
(BROADCAST_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: table_b
filterExpr: (p_dt = '20220731') (type: boolean)
Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 29 Data size: 319 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 1) ->
4:boolean
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: true (type: boolean)
minReductionHashAggr: 0.9655172
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: bid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [bigint]
Map 3
Map Operator Tree:
TableScan
alias: table_a
filterExpr: (p_dt = '20220731') (type: boolean)
Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE
Column stats: COMPLETE
TableScan Vectorization:
native: true
vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string,
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>,
3:ROW__IS__DELETED:boolean]
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 31 Data size: 341 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: ConstantVectorExpression(val 1) ->
4:boolean
native: false
vectorProcessingMode: HASH
projectedOutputColumnNums: []
keys: true (type: boolean)
minReductionHashAggr: 0.96774197
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
keyColumns: 0:boolean
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
inputFormatFeatureSupport: [DECIMAL_64]
featureSupportInUse: [DECIMAL_64]
inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
includeColumns: []
dataColumns: aid:string
partitionColumnCount: 1
partitionColumns: p_dt:string
scratchColumnTypeNames: [bigint]
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Reduce Sink Vectorization:
className: VectorReduceSinkEmptyKeyOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 4
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Reduce Sink Vectorization:
className: VectorReduceSinkEmptyKeyOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reducer 5
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez] IS true
reduceColumnNullOrder: z
reduceColumnSortOrder: +
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
dataColumnCount: 1
dataColumns: KEY._col0:boolean
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
Group By Vectorization:
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:boolean
native: false
vectorProcessingMode: MERGE_PARTIAL
projectedOutputColumnNums: []
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: []
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
Map Join Vectorization:
bigTableRetainColumnNums: []
className: VectorMapJoinInnerBigOnlyMultiKeyOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
hashTableImplementationType: OPTIMIZED
input vertices:
0 Reducer 2
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
Map Join Vectorization:
bigTableRetainColumnNums: []
className: VectorMapJoinInnerBigOnlyMultiKeyOperator
native: true
nativeConditionsMet: hive.mapjoin.optimized.hashtable
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true,
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports
Key Types IS true
nonOuterSmallTableKeyMapping: []
hashTableImplementationType: OPTIMIZED
input vertices:
1 Reducer 4
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: '20220731' (type: string)
outputColumnNames: _col0
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumnNums: [1]
selectExpressions: ConstantVectorExpression(val
20220731) -> 1:string
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
File Sink Vectorization:
className: VectorFileSinkOperator
native: false
Statistics: Num rows: 1 Data size: 92 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink{noformat}
> Wrong results when (map) joining multiple tables on partition column
> --------------------------------------------------------------------
>
> Key: HIVE-26653
> URL: https://issues.apache.org/jira/browse/HIVE-26653
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Attachments: hive_26653.q, hive_26653_explain.txt,
> hive_26653_explain_cbo.txt, table_a.csv, table_b.csv
>
>
> The result of the query must have exactly one row matching the date specified
> in the WHERE clause but the query returns nothing.
> {code:sql}
> CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a;
> CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b;
> set hive.auto.convert.join=true;
> set hive.optimize.semijoin.conversion=false;
> SELECT a.p_dt
> FROM ((SELECT p_dt
> FROM table_b
> GROUP BY p_dt) a
> JOIN
> (SELECT p_dt
> FROM table_a
> GROUP BY p_dt) b ON a.p_dt = b.p_dt
> JOIN
> (SELECT p_dt
> FROM table_a
> GROUP BY p_dt) c ON a.p_dt = c.p_dt)
> WHERE a.p_dt = translate(cast(to_date(date_sub('2022-08-01', 1)) AS string),
> '-', '');
> {code}
> +Expected result+
> 20220731
> +Actual result+
> Empty
> To reproduce the problem the tables need to have some data. Values in aid and
> bid columns are not important. For p_dt column use one of the following
> values 20220731, 20220630.
> I will attach some sample data with which the problem can be reproduced. The
> tables look like below.
> ||aid|pdt||
> |611|20220731|
> |239|20220630|
> |...|...|
> The problem can be reproduced via qtest in current master
> (commit
> [6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a])
> by running the TestMiniLlapLocalCliDriver.
> There is specific query plan (will attach shortly) for which the problem
> shows up so if the plan changes slightly the problem may not appear anymore;
> this is why we need to set explicitly hive.optimize.semijoin.conversion and
> hive.auto.convert.join to trigger the problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)