[
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-11394:
--------------------------------
Description:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\]
\[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
OPERATOR shows vectorization information for operators. E.g. Filter
Vectorization. It includes all information of SUMMARY, too.
EXPRESSION shows vectorization information for expressions. E.g.
predicateExpression. It includes all information of SUMMARY and OPERATOR, too.
DETAIL shows very vectorization information.
It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
The optional clause defaults are not ONLY and SUMMARY.
---------------------------------------------------------------------------------------------------
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION SUMMARY.
Under Reducer 3’s "Reduce Vectorization:" you’ll see
notVectorizedReason: Aggregation Function UDF avg parameter expression for
GROUPBY operator: Data type struct<count:bigint,sum:double,input:int> of
Column\[VALUE._col2\] not supported
For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:":
"false" which says a node has a GROUP BY with an AVG or some other aggregator
that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators
are row-mode. I.e. not vector output.
If "usesVectorUDFAdaptor:": "false" were true, it would say there was at least
one vectorized expression is using VectorUDFAdaptor.
And, "allNative:": "false" will be true when all operators are native. Today,
GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are
conditionally native. FILTER and SELECT are native.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
...
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
...
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: alltypesorc
Statistics: Num rows: 12288 Data size: 36696 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: cint (type: int)
outputColumnNames: cint
Statistics: Num rows: 12288 Data size: 36696 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
keys: cint (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 5775 Data size: 17248 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 5775 Data size: 17248 Basic
stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: false
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 5775 Data size: 17248 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: sum(_col0), count(_col0), avg(_col0), std(_col0)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 172 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 172 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint), _col1 (type:
bigint), _col2 (type: struct<count:bigint,sum:double,input:int>), _col3 (type:
struct<count:bigint,sum:double,variance:double>)
Reducer 3
Execution mode: llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
notVectorizedReason: Aggregation Function UDF avg parameter
expression for GROUPBY operator: Data type
struct<count:bigint,sum:double,input:int> of Column[VALUE._col2] not supported
vectorized: false
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0), count(VALUE._col1),
avg(VALUE._col2), std(VALUE._col3)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE
Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE
Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
EXPLAIN VECTORIZATION OPERATOR
Notice the added TableScan Vectorization, Select Vectorization, Group By
Vectorization, Map Join Vectorizatin, Reduce Sink Vectorization sections in
this example.
Notice the nativeConditionsMet detail on why Reduce Vectorization is native.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Map 2 <- Map 1 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 3 Data size: 294 Basic stats: COMPLETE
Column stats: NONE
TableScan Vectorization:
native: true
projectedOutputColumns: [0, 1]
Filter Operator
Filter Vectorization:
className: VectorFilterOperator
native: true
predicate: c2 is not null (type: boolean)
Statistics: Num rows: 3 Data size: 294 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: c1 (type: int), c2 (type: char(10))
outputColumnNames: _col0, _col1
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumns: [0, 1]
Statistics: Num rows: 3 Data size: 294 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: char(20))
sort order: +
Map-reduce partition columns: _col1 (type: char(20))
Reduce Sink Vectorization:
className: VectorReduceSinkStringOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
Statistics: Num rows: 3 Data size: 294 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: int)
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: true
usesVectorUDFAdaptor: false
vectorized: true
Map 2
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 3 Data size: 324 Basic stats: COMPLETE
Column stats: NONE
TableScan Vectorization:
native: true
projectedOutputColumns: [0, 1]
Filter Operator
Filter Vectorization:
className: VectorFilterOperator
native: true
predicate: c2 is not null (type: boolean)
Statistics: Num rows: 3 Data size: 324 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: c1 (type: int), c2 (type: char(20))
outputColumnNames: _col0, _col1
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumns: [0, 1]
Statistics: Num rows: 3 Data size: 324 Basic stats:
COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: char(20))
1 _col1 (type: char(20))
Map Join Vectorization:
className: VectorMapJoinInnerStringOperator
native: true
nativeConditionsMet:
hive.vectorized.execution.mapjoin.native.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, One MapJoin Condition IS true, No nullsafe IS
true, Supports Key Types IS true, Not empty key IS true, When Fast Hash Table,
then requires no Hybrid Hash Join IS true, Small table vectorizes IS true
outputColumnNames: _col0, _col1, _col2, _col3
input vertices:
0 Map 1
Statistics: Num rows: 3 Data size: 323 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Reduce Sink Vectorization:
className: VectorReduceSinkOperator
native: false
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
nativeConditionsNotMet: Uniform Hash IS false
Statistics: Num rows: 3 Data size: 323 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col1 (type: char(10)), _col2
(type: int), _col3 (type: char(20))
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 3
Execution mode: vectorized, llap
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
groupByVectorOutput: true
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type:
char(10)), VALUE._col1 (type: int), VALUE._col2 (type: char(20))
outputColumnNames: _col0, _col1, _col2, _col3
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumns: [0, 1, 2, 3]
Statistics: Num rows: 3 Data size: 323 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
File Sink Vectorization:
className: VectorFileSinkOperator
native: false
Statistics: Num rows: 3 Data size: 323 Basic stats: COMPLETE
Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
EXPLAIN VECTORIZATION EXPRESSION
Notice the predicateExpression in this example.
{code}
PLAN VECTORIZATION:
enabled: true
enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: vector_interval_2
Statistics: Num rows: 2 Data size: 788 Basic stats: COMPLETE
Column stats: NONE
TableScan Vectorization:
native: true
projectedOutputColumns: [0, 1, 2, 3, 4, 5]
Filter Operator
Filter Vectorization:
className: VectorFilterOperator
native: true
predicateExpression: FilterExprAndExpr(children:
FilterTimestampScalarEqualTimestampColumn(val 2001-01-01 01:02:03.0, col
6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000)
-> 6:timestamp) -> boolean, FilterTimestampScalarNotEqualTimestampColumn(val
2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col 1,
val 0 01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampScalarLessEqualTimestampColumn(val 2001-01-01 01:02:03.0, col
6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000)
-> 6:timestamp) -> boolean, FilterTimestampScalarLessTimestampColumn(val
2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col 1,
val 0 01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampScalarGreaterEqualTimestampColumn(val 2001-01-01 01:02:03.0, col
6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampScalarGreaterTimestampColumn(val 2001-01-01 01:02:03.0, col
6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColEqualTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColNotEqualTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColGreaterEqualTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColGreaterTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColLessEqualTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColLessTimestampScalar(col 6, val 2001-01-01
01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
01:02:04.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColEqualTimestampColumn(col 0, col 6)(children:
DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) ->
6:timestamp) -> boolean, FilterTimestampColNotEqualTimestampColumn(col 0, col
6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000)
-> 6:timestamp) -> boolean, FilterTimestampColLessEqualTimestampColumn(col 0,
col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColLessTimestampColumn(col 0, col 6)(children:
DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) ->
6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampColumn(col 0,
col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
01:02:03.000000000) -> 6:timestamp) -> boolean,
FilterTimestampColGreaterTimestampColumn(col 0, col 6)(children:
DateColSubtractIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) ->
6:timestamp) -> boolean) -> boolean
predicate: ((2001-01-01 01:02:03.0 = (dt + 0
01:02:03.000000000)) and (2001-01-01 01:02:03.0 <> (dt + 0 01:02:04.000000000))
and (2001-01-01 01:02:03.0 <= (dt + 0 01:02:03.000000000)) and (2001-01-01
01:02:03.0 < (dt + 0 01:02:04.000000000)) and (2001-01-01 01:02:03.0 >= (dt - 0
01:02:03.000000000)) and (2001-01-01 01:02:03.0 > (dt - 0 01:02:04.000000000))
and ((dt + 0 01:02:03.000000000) = 2001-01-01 01:02:03.0) and ((dt + 0
01:02:04.000000000) <> 2001-01-01 01:02:03.0) and ((dt + 0 01:02:03.000000000)
>= 2001-01-01 01:02:03.0) and ((dt + 0 01:02:04.000000000) > 2001-01-01
01:02:03.0) and ((dt - 0 01:02:03.000000000) <= 2001-01-01 01:02:03.0) and ((dt
- 0 01:02:04.000000000) < 2001-01-01 01:02:03.0) and (ts = (dt + 0
01:02:03.000000000)) and (ts <> (dt + 0 01:02:04.000000000)) and (ts <= (dt + 0
01:02:03.000000000)) and (ts < (dt + 0 01:02:04.000000000)) and (ts >= (dt - 0
01:02:03.000000000)) and (ts > (dt - 0 01:02:04.000000000))) (type: boolean)
Statistics: Num rows: 1 Data size: 394 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: ts (type: timestamp)
outputColumnNames: _col0
Select Vectorization:
className: VectorSelectOperator
native: true
projectedOutputColumns: [0]
Statistics: Num rows: 1 Data size: 394 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: timestamp)
sort order: +
Reduce Sink Vectorization:
className: VectorReduceSinkOperator
native: false
nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
nativeConditionsNotMet: Uniform Hash IS false
Statistics: Num rows: 1 Data size: 394 Basic stats:
COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map Vectorization:
enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
groupByVectorOutput: true
inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
allNative: false
usesVectorUDFAdaptor: false
vectorized: true
Reducer 2
...
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
was:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not
vectorized.
New syntax is: EXPLAIN VECTORIZATION \[ONLY\]
\[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
The ONLY option suppresses most non-vectorization elements.
SUMMARY shows vectorization information for the PLAN (is vectorization enabled)
and a summary of Map and Reduce work.
OPERATOR shows vectorization information for operators. E.g. Filter
Vectorization. It includes all information of SUMMARY, too.
EXPRESSION shows vectorization information for expressions. E.g.
predicateExpression. It includes all information of SUMMARY and OPERATOR, too.
DETAIL shows very vectorization information.
It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
The optional clause defaults are not ONLY and SUMMARY.
Here are some examples:
EXPLAIN VECTORIZATION example:
(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION SUMMARY.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION OPERATOR
Notice the added Select Vectorization, Group By Vectorization, Reduce Sink
Vectorization sections in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION EXPRESSION
Notice the aaaaa in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION DETAIL
Notice the aaaaa in this example.
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY example:
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY OPERATOR example:
{code}
coming soon…
{code}
EXPLAIN VECTORIZATION ONLY EXPRESSION example:
{code}
{code}
EXPLAIN VECTORIZATION ONLY DETAIL example:
{code}
coming soon…
{code}
The standard @Explain Annotation Type is used. A new 'vectorization'
annotation marks each new class and method.
Works for FORMATTED, like other non-vectorization EXPLAIN variations.
EXPLAIN VECTORIZATION FORMATTED example:
{code}
coming soon…
{code}
or pretty printed:
{code}
coming soon…
{code}
> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch,
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch,
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch,
> HIVE-11394.09.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\]
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators. E.g. Filter
> Vectorization. It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions. E.g.
> predicateExpression. It includes all information of SUMMARY and OPERATOR,
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---------------------------------------------------------------------------------------------------
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for
> GROUPBY operator: Data type struct<count:bigint,sum:double,input:int> of
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:":
> "false" which says a node has a GROUP BY with an AVG or some other aggregator
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators
> are row-mode. I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.
> Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are
> conditionally native. FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> ...
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: alltypesorc
> Statistics: Num rows: 12288 Data size: 36696 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cint (type: int)
> outputColumnNames: cint
> Statistics: Num rows: 12288 Data size: 36696 Basic stats:
> COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: cint (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 5775 Data size: 17248 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 5775 Data size: 17248 Basic
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: false
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 5775 Data size: 17248 Basic stats:
> COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: sum(_col0), count(_col0), avg(_col0),
> std(_col0)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 172 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 172 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint), _col1 (type:
> bigint), _col2 (type: struct<count:bigint,sum:double,input:int>), _col3
> (type: struct<count:bigint,sum:double,variance:double>)
> Reducer 3
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: Aggregation Function UDF avg parameter
> expression for GROUPBY operator: Data type
> struct<count:bigint,sum:double,input:int> of Column[VALUE._col2] not supported
> vectorized: false
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0), count(VALUE._col1),
> avg(VALUE._col2), std(VALUE._col3)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE
> Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> EXPLAIN VECTORIZATION OPERATOR
> Notice the added TableScan Vectorization, Select Vectorization, Group By
> Vectorization, Map Join Vectorizatin, Reduce Sink Vectorization sections in
> this example.
> Notice the nativeConditionsMet detail on why Reduce Vectorization is native.
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> #### A masked pattern was here ####
> Edges:
> Map 2 <- Map 1 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> #### A masked pattern was here ####
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: a
> Statistics: Num rows: 3 Data size: 294 Basic stats:
> COMPLETE Column stats: NONE
> TableScan Vectorization:
> native: true
> projectedOutputColumns: [0, 1]
> Filter Operator
> Filter Vectorization:
> className: VectorFilterOperator
> native: true
> predicate: c2 is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 294 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: c1 (type: int), c2 (type: char(10))
> outputColumnNames: _col0, _col1
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> projectedOutputColumns: [0, 1]
> Statistics: Num rows: 3 Data size: 294 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: char(20))
> sort order: +
> Map-reduce partition columns: _col1 (type: char(20))
> Reduce Sink Vectorization:
> className: VectorReduceSinkStringOperator
> native: true
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true,
> LazyBinarySerDe for values IS true
> Statistics: Num rows: 3 Data size: 294 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: int)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Map 2
> Map Operator Tree:
> TableScan
> alias: b
> Statistics: Num rows: 3 Data size: 324 Basic stats:
> COMPLETE Column stats: NONE
> TableScan Vectorization:
> native: true
> projectedOutputColumns: [0, 1]
> Filter Operator
> Filter Vectorization:
> className: VectorFilterOperator
> native: true
> predicate: c2 is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 324 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: c1 (type: int), c2 (type: char(20))
> outputColumnNames: _col0, _col1
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> projectedOutputColumns: [0, 1]
> Statistics: Num rows: 3 Data size: 324 Basic stats:
> COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col1 (type: char(20))
> 1 _col1 (type: char(20))
> Map Join Vectorization:
> className: VectorMapJoinInnerStringOperator
> native: true
> nativeConditionsMet:
> hive.vectorized.execution.mapjoin.native.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS
> true, No nullsafe IS true, Supports Key Types IS true, Not empty key IS true,
> When Fast Hash Table, then requires no Hybrid Hash Join IS true, Small table
> vectorizes IS true
> outputColumnNames: _col0, _col1, _col2, _col3
> input vertices:
> 0 Map 1
> Statistics: Num rows: 3 Data size: 323 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Reduce Sink Vectorization:
> className: VectorReduceSinkOperator
> native: false
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true,
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
> nativeConditionsNotMet: Uniform Hash IS false
> Statistics: Num rows: 3 Data size: 323 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: char(10)), _col2
> (type: int), _col3 (type: char(20))
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 3
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: int), VALUE._col0
> (type: char(10)), VALUE._col1 (type: int), VALUE._col2 (type: char(20))
> outputColumnNames: _col0, _col1, _col2, _col3
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> projectedOutputColumns: [0, 1, 2, 3]
> Statistics: Num rows: 3 Data size: 323 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> File Sink Vectorization:
> className: VectorFileSinkOperator
> native: false
> Statistics: Num rows: 3 Data size: 323 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> EXPLAIN VECTORIZATION EXPRESSION
> Notice the predicateExpression in this example.
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> #### A masked pattern was here ####
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> #### A masked pattern was here ####
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: vector_interval_2
> Statistics: Num rows: 2 Data size: 788 Basic stats:
> COMPLETE Column stats: NONE
> TableScan Vectorization:
> native: true
> projectedOutputColumns: [0, 1, 2, 3, 4, 5]
> Filter Operator
> Filter Vectorization:
> className: VectorFilterOperator
> native: true
> predicateExpression: FilterExprAndExpr(children:
> FilterTimestampScalarEqualTimestampColumn(val 2001-01-01 01:02:03.0, col
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000)
> -> 6:timestamp) -> boolean, FilterTimestampScalarNotEqualTimestampColumn(val
> 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col
> 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampScalarLessEqualTimestampColumn(val 2001-01-01 01:02:03.0, col
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000)
> -> 6:timestamp) -> boolean, FilterTimestampScalarLessTimestampColumn(val
> 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col
> 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampScalarGreaterEqualTimestampColumn(val 2001-01-01 01:02:03.0,
> col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampScalarGreaterTimestampColumn(val 2001-01-01 01:02:03.0, col
> 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
> 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColEqualTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColNotEqualTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
> 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColGreaterEqualTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColGreaterTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
> 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColLessEqualTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColLessTimestampScalar(col 6, val 2001-01-01
> 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
> 01:02:04.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColEqualTimestampColumn(col 0, col 6)(children:
> DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) ->
> 6:timestamp) -> boolean, FilterTimestampColNotEqualTimestampColumn(col 0, col
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000)
> -> 6:timestamp) -> boolean, FilterTimestampColLessEqualTimestampColumn(col 0,
> col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColLessTimestampColumn(col 0, col 6)(children:
> DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) ->
> 6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampColumn(col 0,
> col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0
> 01:02:03.000000000) -> 6:timestamp) -> boolean,
> FilterTimestampColGreaterTimestampColumn(col 0, col 6)(children:
> DateColSubtractIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) ->
> 6:timestamp) -> boolean) -> boolean
> predicate: ((2001-01-01 01:02:03.0 = (dt + 0
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 <> (dt + 0
> 01:02:04.000000000)) and (2001-01-01 01:02:03.0 <= (dt + 0
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 < (dt + 0
> 01:02:04.000000000)) and (2001-01-01 01:02:03.0 >= (dt - 0
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 > (dt - 0
> 01:02:04.000000000)) and ((dt + 0 01:02:03.000000000) = 2001-01-01
> 01:02:03.0) and ((dt + 0 01:02:04.000000000) <> 2001-01-01 01:02:03.0) and
> ((dt + 0 01:02:03.000000000) >= 2001-01-01 01:02:03.0) and ((dt + 0
> 01:02:04.000000000) > 2001-01-01 01:02:03.0) and ((dt - 0 01:02:03.000000000)
> <= 2001-01-01 01:02:03.0) and ((dt - 0 01:02:04.000000000) < 2001-01-01
> 01:02:03.0) and (ts = (dt + 0 01:02:03.000000000)) and (ts <> (dt + 0
> 01:02:04.000000000)) and (ts <= (dt + 0 01:02:03.000000000)) and (ts < (dt +
> 0 01:02:04.000000000)) and (ts >= (dt - 0 01:02:03.000000000)) and (ts > (dt
> - 0 01:02:04.000000000))) (type: boolean)
> Statistics: Num rows: 1 Data size: 394 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: ts (type: timestamp)
> outputColumnNames: _col0
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> projectedOutputColumns: [0]
> Statistics: Num rows: 1 Data size: 394 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: timestamp)
> sort order: +
> Reduce Sink Vectorization:
> className: VectorReduceSinkOperator
> native: false
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true,
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
> nativeConditionsNotMet: Uniform Hash IS false
> Statistics: Num rows: 1 Data size: 394 Basic stats:
> COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> ...
> {code}
> The standard @Explain Annotation Type is used. A new 'vectorization'
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)