[
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-11394:
--------------------------------
Description:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\]
\[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
OPERATOR shows vectorization information for operators. E.g. Filter
Vectorization. It includes all information of SUMMARY, too.
EXPRESSION shows vectorization information for expressions. E.g.
predicateExpression. It includes all information of SUMMARY and OPERATOR, too.
DETAIL shows very vectorization information.
It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
The optional clause defaults are not ONLY and SUMMARY.
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION SUMMARY.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION OPERATOR
Notice the added Select Vectorization, Group By Vectorization, Reduce Sink
Vectorization sections in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION EXPRESSION
Notice the aaaaa in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION DETAIL
Notice the aaaaa in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY example:
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY OPERATOR example:
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY EXPRESSION example:
{code}
{code}
EXPLAIN VECTORIZATION ONLY DETAIL example:
{code}
coming soon…
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
EXPLAIN VECTORIZATION FORMATTED example:
{code}
coming soon…
{code}
or pretty printed:
{code}
coming soon…
{code}
was:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
The optional clause defaults are not ONLY and SUMMARY.
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
It is the same as EXPLAIN VECTORIZATION SUMMARY.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: decimal_date_test
Statistics: Num rows: 12288 Data size: 2467616 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type:
boolean)
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: cdate (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: date)
sort order: +
Statistics: Num rows: 6144 Data size: 1233808 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: date)
outputColumnNames: _col0
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
EXPLAIN VECTORIZATION DETAIL
(Note the added Select Vectorization, Group By Vectorization, Reduce Sink
Vectorization sections in this example)
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
…
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
…
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: vectortab2korc
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: bo (type: boolean), b (type: bigint)
outputColumnNames: bo, b
Select Vectorization:
className: VectorSelectOperator
native: true
nativeConditionsMet: Supported IS true
selectExpressions: IdentityExpression[7:boolean],
IdentityExpression[3:bigint]
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
aggregations: max(b)
Group By Vectorization:
aggregators:
VectorUDAFMaxLong(IdentityExpression[3:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[7:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: bo (type: boolean)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2000 Data size: 918712 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Reduce Sink Vectorization:
className: VectorReduceSinkLongOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
vectorized: true
Statistics: Num rows: 2000 Data size: 918712 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
Group By Vectorization:
aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
className: VectorGroupByOperator
vectorOutput: true
keyExpressions: IdentityExpression[0:boolean]
native: false
nativeConditionsNotMet: Supported IS false
vectorized: true
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: boolean)
sort order: -
Reduce Sink Vectorization:
className: VectorReduceSinkOperator
native: false
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
nativeConditionsNotMet: Uniform Hash IS false
vectorized: true
Statistics: Num rows: 1000 Data size: 459356 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
…
{code}
EXPLAIN VECTORIZATION ONLY example:
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Vertices:
Map 1
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Map 2
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Stage: Stage-0
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
{code}
{"PLAN
VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT
STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map
3","type":"BROADCAST_EDGE"},{"parent":"Map
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map
1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
3":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
4":{"Map
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
2":{"Reduce
Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
IS true","hive.execution.engine tez IN [tez, spark] IS
true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
{code}
or pretty printed:
{code}
{
"PLAN VECTORIZATION": {
"enabled": true,
"enabledConditionsMet": [
"hive.vectorized.execution.enabled IS true"
]
},
"STAGE DEPENDENCIES": {
"Stage-1": {
"ROOT STAGE": "TRUE"
},
"Stage-0": {
"DEPENDENT STAGES": "Stage-1"
}
},
"STAGE PLANS": {
"Stage-1": {
"Tez": {
"Edges:": {
"Map 1": [
{
"parent": "Map 3",
"type": "BROADCAST_EDGE"
},
{
"parent": "Map 4",
"type": "BROADCAST_EDGE"
}
],
"Reducer 2": {
"parent": "Map 1",
"type": "SIMPLE_EDGE"
}
},
"Vertices:": {
"Map 1": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 3": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Map 4": {
"Map Vectorization:": {
"enabled:": "true",
"enabledConditionsMet:": [
"hive.vectorized.use.vectorized.input.format IS true"
],
"groupByVectorOutput:": "true",
"inputFileFormats:": [
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
],
"allNative:": "true",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
},
"Reducer 2": {
"Reduce Vectorization:": {
"enabled:": "true",
"enableConditionsMet:": [
"hive.vectorized.execution.reduce.enabled IS true",
"hive.execution.engine tez IN [tez, spark] IS true"
],
"groupByVectorOutput:": "true",
"allNative:": "false",
"usesVectorUDFAdaptor:": "false",
"vectorized:": "true"
}
}
}
}
},
"Stage-0": {
}
{code}
> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch,
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch,
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\]
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators. E.g. Filter
> Vectorization. It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions. E.g.
> predicateExpression. It includes all information of SUMMARY and OPERATOR,
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION
> SUMMARY.
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION OPERATOR
> Notice the added Select Vectorization, Group By Vectorization, Reduce Sink
> Vectorization sections in this example.
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION EXPRESSION
> Notice the aaaaa in this example.
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION DETAIL
> Notice the aaaaa in this example.
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION ONLY OPERATOR example:
> {code}
> coming soon…
> {code}
> EXPLAIN VECTORIZATION ONLY EXPRESSION example:
> {code}
> {code}
> EXPLAIN VECTORIZATION ONLY DETAIL example:
> {code}
> coming soon…
> {code}
> The standard @Explain Annotation Type is used. A new 'vectorization'
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION FORMATTED example:
> {code}
> coming soon…
> {code}
> or pretty printed:
> {code}
> coming soon…
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)