[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

Matt McCline (JIRA) Wed, 05 Oct 2016 20:58:02 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matt McCline updated HIVE-11394:
--------------------------------
    Description: 
Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
vectorized.

New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]

The ONLY option suppresses most non-vectorization elements.

SUMMARY shows vectorization information for the PLAN (is vectorization enabled) 
and a summary of Map and Reduce work.

The optional clause defaults are not ONLY and SUMMARY.

Here are some examples:

EXPLAIN VECTORIZATION example:

(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)

It is the same as EXPLAIN VECTORIZATION SUMMARY.

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
…
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
…
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: decimal_date_test
                  Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
boolean)
                    Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: cdate (type: date)
                      outputColumnNames: _col0
                      Statistics: Num rows: 6144 Data size: 1233808 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: date)
                        sort order: +
                        Statistics: Num rows: 6144 Data size: 1233808 Basic 
stats: COMPLETE Column stats: NONE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                groupByVectorOutput: true
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: date)
                outputColumnNames: _col0
                Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}


EXPLAIN VECTORIZATION DETAIL

(Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
Vectorization sections in this example)

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
…
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
…
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: vectortab2korc
                  Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: bo (type: boolean), b (type: bigint)
                    outputColumnNames: bo, b
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        nativeConditionsMet: Supported IS true
                        selectExpressions: IdentityExpression[7:boolean], 
IdentityExpression[3:bigint]
                        vectorized: true
                    Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: max(b)
                      Group By Vectorization:
                          aggregators: 
VectorUDAFMaxLong(IdentityExpression[3:bigint])
                          className: VectorGroupByOperator
                          vectorOutput: true
                          keyExpressions: IdentityExpression[7:boolean]
                          native: false
                          nativeConditionsNotMet: Supported IS false
                          vectorized: true
                      keys: bo (type: boolean)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS 
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                            vectorized: true
                        Statistics: Num rows: 2000 Data size: 918712 Basic 
stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: bigint)
            Execution mode: vectorized
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Reducer 2 
            Execution mode: vectorized
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                groupByVectorOutput: true
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
            Reduce Operator Tree:
              Group By Operator
                aggregations: max(VALUE._col0)
                Group By Vectorization:
                    aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
                    className: VectorGroupByOperator
                    vectorOutput: true
                    keyExpressions: IdentityExpression[0:boolean]
                    native: false
                    nativeConditionsNotMet: Supported IS false
                    vectorized: true
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: boolean)
                  sort order: -
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkOperator
                      native: false
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS 
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for 
keys IS true, LazyBinarySerDe for values IS true
                      nativeConditionsNotMet: Uniform Hash IS false
                      vectorized: true
                  Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint)
…
{code}


EXPLAIN VECTORIZATION ONLY example:

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 2 (BROADCAST_EDGE)
      Vertices:
        Map 1 
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Map 2 
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: true
                usesVectorUDFAdaptor: false
                vectorized: true

  Stage: Stage-0
{code}


The standard @Explain Annotation Type is used.  A new 'vectorization' 
annotation marks each new class and method.

Works for FORMATTED, like other non-vectorization EXPLAIN variations.

EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED

{code}
{"PLAN 
VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
 IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT 
STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE 
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 
3","type":"BROADCAST_EDGE"},{"parent":"Map 
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 
1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
 3":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
 4":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
 2":{"Reduce 
Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
 IS true","hive.execution.engine tez IN [tez, spark] IS 
true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
{code}

or pretty printed:

{code}
{
  "PLAN VECTORIZATION": {
    "enabled": true,
    "enabledConditionsMet": [
      "hive.vectorized.execution.enabled IS true"
    ]
  },
  "STAGE DEPENDENCIES": {
    "Stage-1": {
      "ROOT STAGE": "TRUE"
    },
    "Stage-0": {
      "DEPENDENT STAGES": "Stage-1"
    }
  },
  "STAGE PLANS": {
    "Stage-1": {
      "Tez": {
        "Edges:": {
          "Map 1": [
            {
              "parent": "Map 3",
              "type": "BROADCAST_EDGE"
            },
            {
              "parent": "Map 4",
              "type": "BROADCAST_EDGE"
            }
          ],
          "Reducer 2": {
            "parent": "Map 1",
            "type": "SIMPLE_EDGE"
          }
        },
        "Vertices:": {
          "Map 1": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "false",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Map 3": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "true",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Map 4": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "true",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Reducer 2": {
            "Reduce Vectorization:": {
              "enabled:": "true",
              "enableConditionsMet:": [
                "hive.vectorized.execution.reduce.enabled IS true",
                "hive.execution.engine tez IN [tez, spark] IS true"
              ],
              "groupByVectorOutput:": "true",
              "allNative:": "false",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          }
        }
      }
    },
    "Stage-0": {
      
    }
{code}


  was:
Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
vectorized.

New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]

The ONLY option suppresses most non-vectorization elements.

SUMMARY shows vectorization information for the PLAN (is vectorization enabled) 
and a summary of Map and Reduce work.

The optional clause defaults are not ONLY and SUMMARY.

Here are some examples:

EXPLAIN VECTORIZATION example:

(Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)

It is the same as EXPLAIN VECTORIZATION SUMMARY.

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
…
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
…
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: decimal_date_test
                  Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
boolean)
                    Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: cdate (type: date)
                      outputColumnNames: _col0
                      Statistics: Num rows: 6144 Data size: 1233808 Basic 
stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: date)
                        sort order: +
                        Statistics: Num rows: 6144 Data size: 1233808 Basic 
stats: COMPLETE Column stats: NONE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                groupByVectorOutput: true
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: date)
                outputColumnNames: _col0
                Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
COMPLETE Column stats: NONE
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}


EXPLAIN VECTORIZATION DETAIL

(Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
Vectorization sections in this example)

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
…
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
…
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: vectortab2korc
                  Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: bo (type: boolean), b (type: bigint)
                    outputColumnNames: bo, b
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        nativeConditionsMet: Supported IS true
                        selectExpressions: IdentityExpression[7:boolean], 
IdentityExpression[3:bigint]
                        vectorized: true
                    Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      aggregations: max(b)
                      Group By Vectorization:
                          aggregators: 
VectorUDAFMaxLong(IdentityExpression[3:bigint])
                          className: VectorGroupByOperator
                          vectorOutput: true
                          keyExpressions: IdentityExpression[7:boolean]
                          native: false
                          nativeConditionsNotMet: Supported IS false
                          vectorized: true
                      keys: bo (type: boolean)
                      mode: hash
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS 
true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                            vectorized: true
                        Statistics: Num rows: 2000 Data size: 918712 Basic 
stats: COMPLETE Column stats: NONE
                        value expressions: _col1 (type: bigint)
            Execution mode: vectorized
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Reducer 2 
            Execution mode: vectorized
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                groupByVectorOutput: true
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
            Reduce Operator Tree:
              Group By Operator
                aggregations: max(VALUE._col0)
                Group By Vectorization:
                    aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
                    className: VectorGroupByOperator
                    vectorOutput: true
                    keyExpressions: IdentityExpression[0:boolean]
                    native: false
                    nativeConditionsNotMet: Supported IS false
                    vectorized: true
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: boolean)
                  sort order: -
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkOperator
                      native: false
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS 
true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for 
keys IS true, LazyBinarySerDe for values IS true
                      nativeConditionsNotMet: Uniform Hash IS false
                      vectorized: true
                  Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint)
…
{code}


EXPLAIN VECTORIZATION ONLY example:

{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 2 (BROADCAST_EDGE)
      Vertices:
        Map 1 
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
        Map 2 
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: true
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: true
                usesVectorUDFAdaptor: false
                vectorized: true

  Stage: Stage-0
{code}


The standard @Explain Annotation Type is used.  A new 'vectorization' 
annotation marks each new class and method.

Works for FORMATTED, like other non-vectorization EXPLAIN variations.

{code}
{"PLAN 
VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
 IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT 
STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE 
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 
3","type":"BROADCAST_EDGE"},{"parent":"Map 
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 
1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
 3":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
 4":{"Map 
Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
 IS 
true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
 2":{"Reduce 
Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
 IS true","hive.execution.engine tez IN [tez, spark] IS 
true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
{code}

or pretty printed:

{code}
{
  "PLAN VECTORIZATION": {
    "enabled": true,
    "enabledConditionsMet": [
      "hive.vectorized.execution.enabled IS true"
    ]
  },
  "STAGE DEPENDENCIES": {
    "Stage-1": {
      "ROOT STAGE": "TRUE"
    },
    "Stage-0": {
      "DEPENDENT STAGES": "Stage-1"
    }
  },
  "STAGE PLANS": {
    "Stage-1": {
      "Tez": {
        "Edges:": {
          "Map 1": [
            {
              "parent": "Map 3",
              "type": "BROADCAST_EDGE"
            },
            {
              "parent": "Map 4",
              "type": "BROADCAST_EDGE"
            }
          ],
          "Reducer 2": {
            "parent": "Map 1",
            "type": "SIMPLE_EDGE"
          }
        },
        "Vertices:": {
          "Map 1": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "false",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Map 3": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "true",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Map 4": {
            "Map Vectorization:": {
              "enabled:": "true",
              "enabledConditionsMet:": [
                "hive.vectorized.use.vectorized.input.format IS true"
              ],
              "groupByVectorOutput:": "true",
              "inputFileFormats:": [
                "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
              ],
              "allNative:": "true",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          },
          "Reducer 2": {
            "Reduce Vectorization:": {
              "enabled:": "true",
              "enableConditionsMet:": [
                "hive.vectorized.execution.reduce.enabled IS true",
                "hive.execution.engine tez IN [tez, spark] IS true"
              ],
              "groupByVectorOutput:": "true",
              "allNative:": "false",
              "usesVectorUDFAdaptor:": "false",
              "vectorized:": "true"
            }
          }
        }
      }
    },
    "Stage-0": {
      
    }
{code}



> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
>                 Key: HIVE-11394
>                 URL: https://issues.apache.org/jira/browse/HIVE-11394
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: decimal_date_test
>                   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
>                     Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: cdate (type: date)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: date)
>                         sort order: +
>                         Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: date)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: vectortab2korc
>                   Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: bo (type: boolean), b (type: bigint)
>                     outputColumnNames: bo, b
>                     Select Vectorization:
>                         className: VectorSelectOperator
>                         native: true
>                         nativeConditionsMet: Supported IS true
>                         selectExpressions: IdentityExpression[7:boolean], 
> IdentityExpression[3:bigint]
>                         vectorized: true
>                     Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: max(b)
>                       Group By Vectorization:
>                           aggregators: 
> VectorUDAFMaxLong(IdentityExpression[3:bigint])
>                           className: VectorGroupByOperator
>                           vectorOutput: true
>                           keyExpressions: IdentityExpression[7:boolean]
>                           native: false
>                           nativeConditionsNotMet: Supported IS false
>                           vectorized: true
>                       keys: bo (type: boolean)
>                       mode: hash
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 2000 Data size: 918712 Basic 
> stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: boolean)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: boolean)
>                         Reduce Sink Vectorization:
>                             className: VectorReduceSinkLongOperator
>                             native: true
>                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No 
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true, 
> LazyBinarySerDe for values IS true
>                             vectorized: true
>                         Statistics: Num rows: 2000 Data size: 918712 Basic 
> stats: COMPLETE Column stats: NONE
>                         value expressions: _col1 (type: bigint)
>             Execution mode: vectorized
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 Group By Vectorization:
>                     aggregators: 
> VectorUDAFMaxLong(IdentityExpression[1:bigint])
>                     className: VectorGroupByOperator
>                     vectorOutput: true
>                     keyExpressions: IdentityExpression[0:boolean]
>                     native: false
>                     nativeConditionsNotMet: Supported IS false
>                     vectorized: true
>                 keys: KEY._col0 (type: boolean)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
> COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: boolean)
>                   sort order: -
>                   Reduce Sink Vectorization:
>                       className: VectorReduceSinkOperator
>                       native: false
>                       nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, 
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
>                       nativeConditionsNotMet: Uniform Hash IS false
>                       vectorized: true
>                   Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col1 (type: bigint)
> …
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 2 (BROADCAST_EDGE)
>       Vertices:
>         Map 1 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Map 2 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: true
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>   Stage: Stage-0
> {code}
> The standard @Explain Annotation Type is used.  A new 'vectorization' 
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
> {code}
> {"PLAN 
> VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
>  IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT 
> STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE 
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 
> 3","type":"BROADCAST_EDGE"},{"parent":"Map 
> 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 
> 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
>  3":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
>  4":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
>  2":{"Reduce 
> Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
>  IS true","hive.execution.engine tez IN [tez, spark] IS 
> true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
> {code}
> or pretty printed:
> {code}
> {
>   "PLAN VECTORIZATION": {
>     "enabled": true,
>     "enabledConditionsMet": [
>       "hive.vectorized.execution.enabled IS true"
>     ]
>   },
>   "STAGE DEPENDENCIES": {
>     "Stage-1": {
>       "ROOT STAGE": "TRUE"
>     },
>     "Stage-0": {
>       "DEPENDENT STAGES": "Stage-1"
>     }
>   },
>   "STAGE PLANS": {
>     "Stage-1": {
>       "Tez": {
>         "Edges:": {
>           "Map 1": [
>             {
>               "parent": "Map 3",
>               "type": "BROADCAST_EDGE"
>             },
>             {
>               "parent": "Map 4",
>               "type": "BROADCAST_EDGE"
>             }
>           ],
>           "Reducer 2": {
>             "parent": "Map 1",
>             "type": "SIMPLE_EDGE"
>           }
>         },
>         "Vertices:": {
>           "Map 1": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 3": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 4": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Reducer 2": {
>             "Reduce Vectorization:": {
>               "enabled:": "true",
>               "enableConditionsMet:": [
>                 "hive.vectorized.execution.reduce.enabled IS true",
>                 "hive.execution.engine tez IN [tez, spark] IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           }
>         }
>       }
>     },
>     "Stage-0": {
>       
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

Reply via email to