[
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-11394:
--------------------------------
Status: Patch Available (was: In Progress)
> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch,
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch,
> HIVE-11394.06.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> …
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: decimal_date_test
> Statistics: Num rows: 12288 Data size: 2467616 Basic stats:
> COMPLETE Column stats: NONE
> Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type:
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: cdate (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added Select Vectorization, Group By Vectorization, Reduce Sink
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> …
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: vectortab2korc
> Statistics: Num rows: 2000 Data size: 918712 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: bo (type: boolean), b (type: bigint)
> outputColumnNames: bo, b
> Select Vectorization:
> className: VectorSelectOperator
> native: true
> nativeConditionsMet: Supported IS true
> selectExpressions: IdentityExpression[7:boolean],
> IdentityExpression[3:bigint]
> vectorized: true
> Statistics: Num rows: 2000 Data size: 918712 Basic stats:
> COMPLETE Column stats: NONE
> Group By Operator
> aggregations: max(b)
> Group By Vectorization:
> aggregators:
> VectorUDAFMaxLong(IdentityExpression[3:bigint])
> className: VectorGroupByOperator
> vectorOutput: true
> keyExpressions: IdentityExpression[7:boolean]
> native: false
> nativeConditionsNotMet: Supported IS false
> vectorized: true
> keys: bo (type: boolean)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2000 Data size: 918712 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: boolean)
> sort order: +
> Map-reduce partition columns: _col0 (type: boolean)
> Reduce Sink Vectorization:
> className: VectorReduceSinkLongOperator
> native: true
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true,
> LazyBinarySerDe for values IS true
> vectorized: true
> Statistics: Num rows: 2000 Data size: 918712 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Execution mode: vectorized
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2
> Execution mode: vectorized
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
> Group By Operator
> aggregations: max(VALUE._col0)
> Group By Vectorization:
> aggregators:
> VectorUDAFMaxLong(IdentityExpression[1:bigint])
> className: VectorGroupByOperator
> vectorOutput: true
> keyExpressions: IdentityExpression[0:boolean]
> native: false
> nativeConditionsNotMet: Supported IS false
> vectorized: true
> keys: KEY._col0 (type: boolean)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1000 Data size: 459356 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: boolean)
> sort order: -
> Reduce Sink Vectorization:
> className: VectorReduceSinkOperator
> native: false
> nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true,
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
> nativeConditionsNotMet: Uniform Hash IS false
> vectorized: true
> Statistics: Num rows: 1000 Data size: 459356 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> …
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> PLAN VECTORIZATION:
> enabled: true
> enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Vertices:
> Map 1
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Map 2
> Map Vectorization:
> enabled: true
> enabledConditionsMet:
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats:
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Stage: Stage-0
> {code}
> The standard @Explain Annotation Type is used. A new 'vectorization'
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
> {code}
> {"PLAN
> VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
> IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT
> STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map
> 3","type":"BROADCAST_EDGE"},{"parent":"Map
> 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map
> 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
> 3":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
> 4":{"Map
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
> IS
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
> 2":{"Reduce
> Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
> IS true","hive.execution.engine tez IN [tez, spark] IS
> true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
> {code}
> or pretty printed:
> {code}
> {
> "PLAN VECTORIZATION": {
> "enabled": true,
> "enabledConditionsMet": [
> "hive.vectorized.execution.enabled IS true"
> ]
> },
> "STAGE DEPENDENCIES": {
> "Stage-1": {
> "ROOT STAGE": "TRUE"
> },
> "Stage-0": {
> "DEPENDENT STAGES": "Stage-1"
> }
> },
> "STAGE PLANS": {
> "Stage-1": {
> "Tez": {
> "Edges:": {
> "Map 1": [
> {
> "parent": "Map 3",
> "type": "BROADCAST_EDGE"
> },
> {
> "parent": "Map 4",
> "type": "BROADCAST_EDGE"
> }
> ],
> "Reducer 2": {
> "parent": "Map 1",
> "type": "SIMPLE_EDGE"
> }
> },
> "Vertices:": {
> "Map 1": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "false",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Map 3": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "true",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Map 4": {
> "Map Vectorization:": {
> "enabled:": "true",
> "enabledConditionsMet:": [
> "hive.vectorized.use.vectorized.input.format IS true"
> ],
> "groupByVectorOutput:": "true",
> "inputFileFormats:": [
> "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
> ],
> "allNative:": "true",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> },
> "Reducer 2": {
> "Reduce Vectorization:": {
> "enabled:": "true",
> "enableConditionsMet:": [
> "hive.vectorized.execution.reduce.enabled IS true",
> "hive.execution.engine tez IN [tez, spark] IS true"
> ],
> "groupByVectorOutput:": "true",
> "allNative:": "false",
> "usesVectorUDFAdaptor:": "false",
> "vectorized:": "true"
> }
> }
> }
> }
> },
> "Stage-0": {
>
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)