[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

Matt McCline (JIRA) Mon, 03 Oct 2016 23:34:47 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matt McCline updated HIVE-11394:
--------------------------------
    Description: 
Add detail to the EXPLAIN output showing why a Map or Reduce task was not 
vectorized.

Add new VECTORIZATION option that displays 3 levels.  Here are some examples:

(At the beginning)
{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
{code}

For Map and Reduce nodes:
{code}
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: false
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
{code}



{code}
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                notVectorizedReason: Aggregation Function UDF avg parameter 
expression for GROUPBY operator: Data type 
struct<count:bigint,sum:decimal(38,18),input:decimal(38,18)> of 
Column[VALUE._col3] not supported
                vectorized: false
{code}

And, for each vectorized operator:
{code}
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        nativeConditionsMet: Supported IS true
                        selectExpressions: IdentityExpression[6:decimal(38,18)]
                        vectorized: true
{code}

{code}
                      Map Join Vectorization:
                          className: VectorMapJoinOperator
                          native: false
                          nativeConditionsMet: 
hive.vectorized.execution.mapjoin.native.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, One MapJoin Condition IS true, No nullsafe IS 
true, Supports Key Types IS true, When Fast Hash Table, then requires no Hybrid 
Hash Join IS true, Small table vectorizes IS true
                          nativeConditionsNotMet: Not empty key IS false
                          vectorized: true
{code}

The standard @Explain Annotation Type is used.  A new 'vectorization' 
annotation marks each new class and method.

Works for FORMATTED, like other non-vectorization variations.

Consider adding options to just show Vectorization information:

EXPLAIN VECTORIZATION [ONLY] [SUMMARY|DETAIL]

where current patch is equivalent to EXPLAIN VECTORIZATION DETAIL.

SUMMARY would add PLAN VECTORIZATION and Map/Reduce Vectorization, but not 
operator detail.

ONLY would suppress most non-vectorization elements.

  was:
Add detail to the EXPLAIN output showing why a Map or Reduce task was not 
vectorized.

Add new VECTORIZATION option that displays 3 levels.  Here are some examples:

(At the beginning)
{code}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
{code}

For Map and Reduce nodes:
{code}
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vectorized.input.format IS true
                groupByVectorOutput: false
                inputFileFormats: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
{code}



{code}
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez, spark] IS true
                notVectorizedReason: Aggregation Function UDF avg parameter 
expression for GROUPBY operator: Data type 
struct<count:bigint,sum:decimal(38,18),input:decimal(38,18)> of 
Column[VALUE._col3] not supported
                vectorized: false
{code}

And, for each vectorized operator:
{code}
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        nativeConditionsMet: Supported IS true
                        selectExpressions: IdentityExpression[6:decimal(38,18)]
                        vectorized: true
{code}

{code}
                      Map Join Vectorization:
                          className: VectorMapJoinOperator
                          native: false
                          nativeConditionsMet: 
hive.vectorized.execution.mapjoin.native.enabled IS true, hive.execution.engine 
tez IN [tez, spark] IS true, One MapJoin Condition IS true, No nullsafe IS 
true, Supports Key Types IS true, When Fast Hash Table, then requires no Hybrid 
Hash Join IS true, Small table vectorizes IS true
                          nativeConditionsNotMet: Not empty key IS false
                          vectorized: true
{code}

The standard @Explain Annotation Type is used.  A new 'vectorization' 
annotation marks each new class and method.

Works for FORMATTED, like other non-vectorization variations.

Consider adding options to just show Vectorization information:

EXPLAIN VECTORIZATION [ONLY] [SUMMARY|DETAIL]

where current patch is equivalent to EXPLAIN VECTORIZATION DETAIL.

SUMMARY would just show PLAN VECTORIZATION and Map/Reduce Vectorization, but 
not operator detail.


> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
>                 Key: HIVE-11394
>                 URL: https://issues.apache.org/jira/browse/HIVE-11394
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-11394.01.patch
>
>
> Add detail to the EXPLAIN output showing why a Map or Reduce task was not 
> vectorized.
> Add new VECTORIZATION option that displays 3 levels.  Here are some examples:
> (At the beginning)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> {code}
> For Map and Reduce nodes:
> {code}
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: false
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
> {code}
> {code}
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 notVectorizedReason: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct<count:bigint,sum:decimal(38,18),input:decimal(38,18)> of 
> Column[VALUE._col3] not supported
>                 vectorized: false
> {code}
> And, for each vectorized operator:
> {code}
>                     Select Vectorization:
>                         className: VectorSelectOperator
>                         native: true
>                         nativeConditionsMet: Supported IS true
>                         selectExpressions: 
> IdentityExpression[6:decimal(38,18)]
>                         vectorized: true
> {code}
> {code}
>                       Map Join Vectorization:
>                           className: VectorMapJoinOperator
>                           native: false
>                           nativeConditionsMet: 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Supports Key Types IS true, When Fast Hash Table, 
> then requires no Hybrid Hash Join IS true, Small table vectorizes IS true
>                           nativeConditionsNotMet: Not empty key IS false
>                           vectorized: true
> {code}
> The standard @Explain Annotation Type is used.  A new 'vectorization' 
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization variations.
> Consider adding options to just show Vectorization information:
> EXPLAIN VECTORIZATION [ONLY] [SUMMARY|DETAIL]
> where current patch is equivalent to EXPLAIN VECTORIZATION DETAIL.
> SUMMARY would add PLAN VECTORIZATION and Map/Reduce Vectorization, but not 
> operator detail.
> ONLY would suppress most non-vectorization elements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

Reply via email to