[
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-11394:
--------------------------------
Description:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
The optional clause defaults are not ONLY and SUMMARY.
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
It is the same as EXPLAIN VECTORIZATION SUMMARY.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: decimal_date_test
Statistics: Num rows: 12288 Data size: 2467616 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type:
boolean)
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: cdate (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: date)
sort order: +
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
EXPLAIN VECTORIZATION DETAIL
(Note the added Select Vectorization, Group By Vectorization, Reduce Sink
Vectorization sections in this example)
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: vectortab2korc
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: bo (type: boolean), b (type: bigint)
outputColumnNames: bo, b
Select Vectorization:
className: VectorSelectOperator
native: true
nativeConditionsMet: Supported IS true
selectExpressions: IdentityExpression[7:boolean],
IdentityExpression[3:bigint]
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
aggregations: max(b)
Group By Vectorization:
aggregators:
VectorUDAFMaxLong(IdentityExpression[3:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[7:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: bo (type: boolean)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
Group By Vectorization:
aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[0:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: -
Reduce Sink Vectorization:
className: VectorReduceSinkOperator
native: false
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
nativeConditionsNotMet: Uniform Hash IS false
vectorized: true
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
…
{code}
EXPLAIN VECTORIZATION ONLY example:
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Vertices:
Map 1
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Map 2
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Stage: Stage-0
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
{code}
{"PLAN
VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT
STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map
3","type":"BROADCAST_EDGE"},{"parent":"Map
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map
1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
3":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
4":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
2":{"Reduce
Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
IS true","hive.execution.engine tez IN [tez, spark] IS
true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
{code}
or pretty printed:
{code}
{
"PLAN VECTORIZATION": {
"enabled": true,
"enabledConditionsMet": [
"hive.vectorized.execution.enabled IS true"
]
},
"STAGE DEPENDENCIES": {
"Stage-1": {
"ROOT STAGE": "TRUE"
},
"Stage-0": {
"DEPENDENT STAGES": "Stage-1"
}
},
"STAGE PLANS": {
"Stage-1": {
"Tez": {
"Edges:": {
"Map 1": [
{
"parent": "Map 3",
"type": "BROADCAST_EDGE"
},
{
"parent": "Map 4",
"type": "BROADCAST_EDGE"
}
],
"Reducer 2": {
"parent": "Map 1",
"type": "SIMPLE_EDGE"
}
},
"Vertices:": {
"Map 1": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 3": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 4": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Reducer 2": {
"Reduce Vectorization:": {
"enabled:": "true",
"enableConditionsMet:": [
"hive.vectorized.execution.reduce.enabled IS true",
"hive.execution.engine tez IN [tez, spark] IS true"
],
"groupByVectorOutput:": "true",
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
}
}
}
},
"Stage-0": {
}
{code}
was:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
The optional clause defaults are not ONLY and SUMMARY.
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
It is the same as EXPLAIN VECTORIZATION SUMMARY.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: decimal_date_test
Statistics: Num rows: 12288 Data size: 2467616 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type:
boolean)
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: cdate (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: date)
sort order: +
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
EXPLAIN VECTORIZATION DETAIL
(Note the added Select Vectorization, Group By Vectorization, Reduce Sink
Vectorization sections in this example)
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: vectortab2korc
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: bo (type: boolean), b (type: bigint)
outputColumnNames: bo, b
Select Vectorization:
className: VectorSelectOperator
native: true
nativeConditionsMet: Supported IS true
selectExpressions: IdentityExpression[7:boolean],
IdentityExpression[3:bigint]
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
aggregations: max(b)
Group By Vectorization:
aggregators:
VectorUDAFMaxLong(IdentityExpression[3:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[7:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: bo (type: boolean)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
Group By Vectorization:
aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[0:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: -
Reduce Sink Vectorization:
className: VectorReduceSinkOperator
native: false
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
nativeConditionsNotMet: Uniform Hash IS false
vectorized: true
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
…
{code}
EXPLAIN VECTORIZATION ONLY example:
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Vertices:
Map 1
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Map 2
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Stage: Stage-0
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
{code}
{"PLAN
VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT
STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map
3","type":"BROADCAST_EDGE"},{"parent":"Map
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map
1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
3":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
4":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
2":{"Reduce
Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
IS true","hive.execution.engine tez IN [tez, spark] IS
true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
{code}
or pretty printed:
{code}
{
"PLAN VECTORIZATION": {
"enabled": true,
"enabledConditionsMet": [
"hive.vectorized.execution.enabled IS true"
]
},
"STAGE DEPENDENCIES": {
"Stage-1": {
"ROOT STAGE": "TRUE"
},
"Stage-0": {
"DEPENDENT STAGES": "Stage-1"
}
},
"STAGE PLANS": {
"Stage-1": {
"Tez": {
"Edges:": {
"Map 1": [
{
"parent": "Map 3",
"type": "BROADCAST_EDGE"
},
{
"parent": "Map 4",
"type": "BROADCAST_EDGE"
}
],
"Reducer 2": {
"parent": "Map 1",
"type": "SIMPLE_EDGE"
}
},
"Vertices:": {
"Map 1": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 3": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 4": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Reducer 2": {
"Reduce Vectorization:": {
"enabled:": "true",
"enableConditionsMet:": [
"hive.vectorized.execution.reduce.enabled IS true",
"hive.execution.engine tez IN [tez, spark] IS true"
],
"groupByVectorOutput:": "true",
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
}
}
}
},
"Stage-0": {
}
{code}
> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch,
> HIVE-11394.03.patch, HIVE-11394.04.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> …
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: decimal_date_test
> Statistics: Num rows: 12288 Data size: 2467616 Basic stats:
> COMPLETE Column stats: NONE
> Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type:
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: cdate (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added Select Vectorization, Group By Vectorization, Reduce Sink
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> …
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: vectortab2korc
> Statistics: Num rows: 2000 Data size: 918712 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: bo (type: boolean), b (type: bigint)
> outputColumnNames: bo, b
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> nativeConditionsMet: Supported IS true
> selectExpressions: IdentityExpression[7:boolean],
> IdentityExpression[3:bigint]
> vectorized: true
> Statistics: Num rows: 2000 Data size: 918712 Basic stats:
> COMPLETE Column stats: NONE
> Group By Operator
> aggregations: max(b)
> Group By Vectorization:
> aggregators:
> VectorUDAFMaxLong(IdentityExpression[3:bigint])
> className: VectorGroupByOperator
> vectorOutput: true
> keyExpressions: IdentityExpression[7:boolean]
> native: false
> nativeConditionsNotMet: Supported IS false
> vectorized: true
> keys: bo (type: boolean)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2000 Data size: 918712 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: boolean)
> sort order: +
> Map-reduce partition columns: _col0 (type: boolean)
> Reduce Sink Vectorization:
> className: VectorReduceSinkLongOperator
> native: true
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true,
> LazyBinarySerDe for values IS true
> vectorized: true
> Statistics: Num rows: 2000 Data size: 918712 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Execution mode: vectorized
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> Execution mode: vectorized
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Group By Operator
> aggregations: max(VALUE._col0)
> Group By Vectorization:
> aggregators:
> VectorUDAFMaxLong(IdentityExpression[1:bigint])
> className: VectorGroupByOperator
> vectorOutput: true
> keyExpressions: IdentityExpression[0:boolean]
> native: false
> nativeConditionsNotMet: Supported IS false
> vectorized: true
> keys: KEY._col0 (type: boolean)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1000 Data size: 459356 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: boolean)
> sort order: -
> Reduce Sink Vectorization:
> className: VectorReduceSinkOperator
> native: false
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true,
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
> nativeConditionsNotMet: Uniform Hash IS false
> vectorized: true
> Statistics: Num rows: 1000 Data size: 459356 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> …
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Vertices:
> Map 1
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Map 2
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Stage: Stage-0
> {code}
> The standard @Explain Annotation Type is used. A new 'vectorization'
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
> {code}
> {"PLAN
> VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
> IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT
> STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map
> 3","type":"BROADCAST_EDGE"},{"parent":"Map
> 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map
> 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
> 3":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
> 4":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
> 2":{"Reduce
> Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
> IS true","hive.execution.engine tez IN [tez, spark] IS
> true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
> {code}
> or pretty printed:
> {code}
> {
> "PLAN VECTORIZATION": {
> "enabled": true,
> "enabledConditionsMet": [
> "hive.vectorized.execution.enabled IS true"
> ]
> },
> "STAGE DEPENDENCIES": {
> "Stage-1": {
> "ROOT STAGE": "TRUE"
> },
> "Stage-0": {
> "DEPENDENT STAGES": "Stage-1"
> }
> },
> "STAGE PLANS": {
> "Stage-1": {
> "Tez": {
> "Edges:": {
> "Map 1": [
> {
> "parent": "Map 3",
> "type": "BROADCAST_EDGE"
> },
> {
> "parent": "Map 4",
> "type": "BROADCAST_EDGE"
> }
> ],
> "Reducer 2": {
> "parent": "Map 1",
> "type": "SIMPLE_EDGE"
> }
> },
> "Vertices:": {
> "Map 1": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "false",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Map 3": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "true",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Map 4": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "true",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Reducer 2": {
> "Reduce Vectorization:": {
> "enabled:": "true",
> "enableConditionsMet:": [
> "hive.vectorized.execution.reduce.enabled IS true",
> "hive.execution.engine tez IN [tez, spark] IS true"
> ],
> "groupByVectorOutput:": "true",
> "allNative:": "false",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> }
> }
> }
> },
> "Stage-0": {
>
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)