[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

Matt McCline (JIRA) Fri, 14 Oct 2016 00:48:40 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574571#comment-15574571
 ]


Matt McCline commented on HIVE-11394:
-------------------------------------

Why would my change cause this in the log for TestMiniLlapCliDriver running 
orc_llap.q (I haven't tried the local driver yet)

{code}
2016-10-14T00:26:12,500 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Received heartbeat response from AM, response={  lastRequestId=12, 
shouldDie=false, nextFromEventId=0, nextPreRoutedEventId=1, eventCount=0 }
2016-10-14T00:26:12,504  INFO [TezTaskRunner] LlapIoImpl: Llap counters: 
Fragment counters for [hw10891.local/192.168.1.129, warehouse.orc_llap, 
hdfs://localhost:65227/build/ql/test/data/warehouse/orc_llap/000000_0 (16806), 
0,1]: [ NUM_VECTOR_BATCHES=123, NUM_DECODED_BATCHES=123, 
SELECTED_ROWGROUPS=123, NUM_ERRORS=0, ROWS_EMITTED=122880, 
METADATA_CACHE_HIT=2, METADATA_CACHE_MISS=0, CACHE_HIT_BYTES=308950, 
CACHE_MISS_BYTES=0, ALLOCATED_BYTES=0, ALLOCATED_USED_BYTES=0, 
TOTAL_IO_TIME_NS=166039000, DECODE_TIME_NS=35640000, HDFS_TIME_NS=90000, 
CONSUMER_TIME_NS=906792000 ]
2016-10-14T00:26:12,506 DEBUG [TezTaskRunner] encoded.OrcEncodedDataReader: 
Encoded reader is being stopped
2016-10-14T00:26:12,506 DEBUG [TezTaskRunner] log.PerfLogger: </PERFLOG 
method=TezRunProcessor start=1476429971363 end=1476429972506 duration=1143 
from=org.apache.hadoop.hive.ql.exec.tez.TezProcessor>
2016-10-14T00:26:12,508 DEBUG [TezTaskRunner] mr.ObjectCache: 
mmccline_20161014002608_27a7ab5e-e018-482b-a69d-e3e92ffc9f1f_Map 1__MAP_PLAN__ 
no longer needed
2016-10-14T00:26:12,510  INFO [TezTaskRunner] vector.VectorMapOperator: 
RECORDS_IN_Map_1:122880, DESERIALIZE_ERRORS:0, 
2016-10-14T00:26:12,510  INFO [TezTaskRunner] 
reducesink.VectorReduceSinkCommonOperator: RS[23]: records written - 60590
2016-10-14T00:26:12,510  INFO [TezTaskRunner] 
reducesink.VectorReduceSinkLongOperator: RECORDS_OUT_INTERMEDIATE_Map_1:60590, 
2016-10-14T00:26:12,512  INFO [TezTaskRunner] task.TaskRunner2Callable: Closing 
task, taskAttemptId=attempt_1476429881863_0001_39_01_000000_0
2016-10-14T00:26:12,513  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Starting flush of map output
2016-10-14T00:26:12,513  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Span0.length = 60590, perItem = 12
2016-10-14T00:26:12,535  INFO [TezTaskRunner] LlapIoImpl: Llap counters: 
Fragment counters for [hw10891.local/192.168.1.129, warehouse.orc_llap, 
hdfs://localhost:65227/build/ql/test/data/warehouse/orc_llap/000000_0 (16806), 
0,1]: [ NUM_VECTOR_BATCHES=123, NUM_DECODED_BATCHES=123, 
SELECTED_ROWGROUPS=123, NUM_ERRORS=0, ROWS_EMITTED=122880, 
METADATA_CACHE_HIT=2, METADATA_CACHE_MISS=0, CACHE_HIT_BYTES=309552, 
CACHE_MISS_BYTES=0, ALLOCATED_BYTES=0, ALLOCATED_USED_BYTES=0, 
TOTAL_IO_TIME_NS=194286000, DECODE_TIME_NS=46068000, HDFS_TIME_NS=690000, 
CONSUMER_TIME_NS=939666000 ]
2016-10-14T00:26:12,535 DEBUG [TezTaskRunner] encoded.OrcEncodedDataReader: 
Encoded reader is being stopped
2016-10-14T00:26:12,535 DEBUG [TezTaskRunner] log.PerfLogger: </PERFLOG 
method=TezRunProcessor start=1476429971365 end=1476429972535 duration=1170 
from=org.apache.hadoop.hive.ql.exec.tez.TezProcessor>
2016-10-14T00:26:12,535 DEBUG [TezTaskRunner] mr.ObjectCache: 
mmccline_20161014002608_27a7ab5e-e018-482b-a69d-e3e92ffc9f1f_Map 4__MAP_PLAN__ 
no longer needed
2016-10-14T00:26:12,535  INFO [TezTaskRunner] vector.VectorMapOperator: 
RECORDS_IN_Map_4:122880, DESERIALIZE_ERRORS:0, 
2016-10-14T00:26:12,535  INFO [TezTaskRunner] 
reducesink.VectorReduceSinkCommonOperator: RS[26]: records written - 60590
2016-10-14T00:26:12,536  INFO [TezTaskRunner] 
reducesink.VectorReduceSinkLongOperator: RECORDS_OUT_INTERMEDIATE_Map_4:60590, 
2016-10-14T00:26:12,536  INFO [TezTaskRunner] task.TaskRunner2Callable: Closing 
task, taskAttemptId=attempt_1476429881863_0001_39_00_000000_0
2016-10-14T00:26:12,536  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Starting flush of map output
2016-10-14T00:26:12,536  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Span0.length = 60590, perItem = 23
2016-10-14T00:26:12,555 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Sending heartbeat to AM, request={  
containerId=container_222212222_0001_01_000065, requestId=11, startIndex=0, 
preRoutedStartIndex=1, maxEventsToGet=500, 
taskAttemptId=attempt_1476429881863_0001_39_01_000000_0, eventCount=1 }
2016-10-14T00:26:12,557 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Received heartbeat response from AM, response={  lastRequestId=11, 
shouldDie=false, nextFromEventId=0, nextPreRoutedEventId=1, eventCount=0 }
2016-10-14T00:26:12,704 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Sending heartbeat to AM, request={  
containerId=container_222212222_0001_01_000065, requestId=12, startIndex=0, 
preRoutedStartIndex=1, maxEventsToGet=500, 
taskAttemptId=attempt_1476429881863_0001_39_01_000000_0, eventCount=1 }
2016-10-14T00:26:12,706 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Received heartbeat response from AM, response={  lastRequestId=12, 
shouldDie=false, nextFromEventId=0, nextPreRoutedEventId=1, eventCount=0 }
2016-10-14T00:26:12,708 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Sending heartbeat to AM, request={  
containerId=container_222212222_0001_01_000066, requestId=13, startIndex=0, 
preRoutedStartIndex=1, maxEventsToGet=500, 
taskAttemptId=attempt_1476429881863_0001_39_00_000000_0, eventCount=1 }
2016-10-14T00:26:12,709 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Received heartbeat response from AM, response={  lastRequestId=13, 
shouldDie=false, nextFromEventId=0, nextPreRoutedEventId=1, eventCount=0 }
2016-10-14T00:26:12,752  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
done sorting span=0, length=60590, time=239
2016-10-14T00:26:12,754  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Heap = SpanIterator<0:60589> (span=Span[5242880,764210]),
2016-10-14T00:26:12,767  INFO [TezTaskRunner] impl.PipelinedSorter: Reducer 2: 
Spilling to 
/Users/mmccline/VecDetail/itests/qtest/target/llap_MiniLlapCluster/localDir/usercache/mmccline/appcache/application_1476429881863_0001/39/output/attempt_1476429881863_0001_39_01_000000_0_10197_0/file.out
2016-10-14T00:26:12,770 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,773 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,773 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,773 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,774 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,774 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0
2016-10-14T00:26:12,774 DEBUG [TezTaskRunner] util.Progress: Illegal progress 
value found, progress is less than 0. Progress will be changed to 0

...
{code}

I can revert the change but I dont' understand why the end of the log is filled 
with:

{code}
2016-10-14T00:39:12,381 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Sending heartbeat to AM, request={  
containerId=container_222212222_0001_01_000082, requestId=6986, startIndex=2, 
preRoutedStartIndex=0, maxEventsToGet=500, 
taskAttemptId=attempt_1476429881863_0001_46_02_000000_0, eventCount=1 }
2016-10-14T00:39:12,383 DEBUG [TaskHeartbeatThread] impl.LlapTaskReporter: 
Received heartbeat response from AM, response={  lastRequestId=6986, 
shouldDie=false, nextFromEventId=2, nextPreRoutedEventId=0, eventCount=0 }
{code}

Do I have an infinite loop in my code?

> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
>                 Key: HIVE-11394
>                 URL: https://issues.apache.org/jira/browse/HIVE-11394
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>              Labels: TODOC2.2
>             Fix For: 2.2.0
>
>         Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, 
> HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, 
> HIVE-11394.093.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] 
> \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> OPERATOR shows vectorization information for operators.  E.g. Filter 
> Vectorization.  It includes all information of SUMMARY, too.
> EXPRESSION shows vectorization information for expressions.  E.g. 
> predicateExpression.  It includes all information of SUMMARY and OPERATOR, 
> too.
> DETAIL shows very vectorization information.
> It includes all information of SUMMARY, OPERATOR, and EXPRESSION too.
> The optional clause defaults are not ONLY and SUMMARY.
> ---------------------------------------------------------------------------------------------------
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION 
> SUMMARY.
> Under Reducer 3’s "Reduce Vectorization:" you’ll see
> notVectorizedReason: Aggregation Function UDF avg parameter expression for 
> GROUPBY operator: Data type struct<count:bigint,sum:double,input:int> of 
> Column\[VALUE._col2\] not supported
> For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": 
> "false" which says a node has a GROUP BY with an AVG or some other aggregator 
> that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators 
> are row-mode.  I.e. not vector output.
> If "usesVectorUDFAdaptor:": "false" were true, it would say there was at 
> least one vectorized expression is using VectorUDFAdaptor.
> And, "allNative:": "false" will be true when all operators are native.  
> Today, GROUP BY and FILE SINK are not native.  MAP JOIN and REDUCE SINK are 
> conditionally native.  FILTER and SELECT are native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> ...
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> ...
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: alltypesorc
>                   Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: cint (type: int)
>                     outputColumnNames: cint
>                     Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Group By Operator
>                       keys: cint (type: int)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 5775 Data size: 17248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: false
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 5775 Data size: 17248 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 Group By Operator
>                   aggregations: sum(_col0), count(_col0), avg(_col0), 
> std(_col0)
>                   mode: hash
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 1 Data size: 172 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     sort order: 
>                     Statistics: Num rows: 1 Data size: 172 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     value expressions: _col0 (type: bigint), _col1 (type: 
> bigint), _col2 (type: struct<count:bigint,sum:double,input:int>), _col3 
> (type: struct<count:bigint,sum:double,variance:double>)
>         Reducer 3 
>             Execution mode: llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 notVectorizedReason: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct<count:bigint,sum:double,input:int> of Column[VALUE._col2] not supported
>                 vectorized: false
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0), count(VALUE._col1), 
> avg(VALUE._col2), std(VALUE._col3)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink 
> {code}
> EXPLAIN VECTORIZATION OPERATOR
> Notice the added  TableScan Vectorization, Select Vectorization, Group By 
> Vectorization, Map Join Vectorizatin, Reduce Sink Vectorization sections in 
> this example.
> Notice the nativeConditionsMet detail on why Reduce Vectorization is native.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Edges:
>         Map 2 <- Map 1 (BROADCAST_EDGE)
>         Reducer 3 <- Map 2 (SIMPLE_EDGE)
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   Statistics: Num rows: 3 Data size: 294 Basic stats: 
> COMPLETE Column stats: NONE
>                   TableScan Vectorization:
>                       native: true
>                       projectedOutputColumns: [0, 1]
>                   Filter Operator
>                     Filter Vectorization:
>                         className: VectorFilterOperator
>                         native: true
> predicate: c2 is not null (type: boolean)
>                     Statistics: Num rows: 3 Data size: 294 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: c1 (type: int), c2 (type: char(10))
>                       outputColumnNames: _col0, _col1
>                       Select Vectorization:
>                           className: VectorSelectOperator
>                           native: true
>                           projectedOutputColumns: [0, 1]
>                       Statistics: Num rows: 3 Data size: 294 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col1 (type: char(20))
>                         sort order: +
>                         Map-reduce partition columns: _col1 (type: char(20))
>                         Reduce Sink Vectorization:
>                             className: VectorReduceSinkStringOperator
>                             native: true
>                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No 
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true, 
> LazyBinarySerDe for values IS true
>                         Statistics: Num rows: 3 Data size: 294 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: int)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: true
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   Statistics: Num rows: 3 Data size: 324 Basic stats: 
> COMPLETE Column stats: NONE
>                   TableScan Vectorization:
>                       native: true
>                       projectedOutputColumns: [0, 1]
>                   Filter Operator
>                     Filter Vectorization:
>                         className: VectorFilterOperator
>                         native: true
> predicate: c2 is not null (type: boolean)
>                     Statistics: Num rows: 3 Data size: 324 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: c1 (type: int), c2 (type: char(20))
>                       outputColumnNames: _col0, _col1
>                       Select Vectorization:
>                           className: VectorSelectOperator
>                           native: true
>                           projectedOutputColumns: [0, 1]
>                       Statistics: Num rows: 3 Data size: 324 Basic stats: 
> COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col1 (type: char(20))
>                           1 _col1 (type: char(20))
>                         Map Join Vectorization:
>                             className: VectorMapJoinInnerStringOperator
>                             native: true
>                             nativeConditionsMet: 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Supports Key Types IS true, Not empty key IS true, 
> When Fast Hash Table, then requires no Hybrid Hash Join IS true, Small table 
> vectorizes IS true
>                         outputColumnNames: _col0, _col1, _col2, _col3
>                         input vertices:
>                           0 Map 1
>                         Statistics: Num rows: 3 Data size: 323 Basic stats: 
> COMPLETE Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int)
>                           sort order: +
>                           Reduce Sink Vectorization:
>                               className: VectorReduceSinkOperator
>                               native: false
>                               nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, 
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
>                               nativeConditionsNotMet: Uniform Hash IS false
>                           Statistics: Num rows: 3 Data size: 323 Basic stats: 
> COMPLETE Column stats: NONE
>                           value expressions: _col1 (type: char(10)), _col2 
> (type: int), _col3 (type: char(20))
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 3 
>             Execution mode: vectorized, llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 
> (type: char(10)), VALUE._col1 (type: int), VALUE._col2 (type: char(20))
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Select Vectorization:
>                     className: VectorSelectOperator
>                     native: true
>                     projectedOutputColumns: [0, 1, 2, 3]
>                 Statistics: Num rows: 3 Data size: 323 Basic stats: COMPLETE 
> Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   File Sink Vectorization:
>                       className: VectorFileSinkOperator
>                       native: false
>                   Statistics: Num rows: 3 Data size: 323 Basic stats: 
> COMPLETE Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
>  {code}
> EXPLAIN VECTORIZATION EXPRESSION
> Notice the predicateExpression in this example.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: vector_interval_2
>                   Statistics: Num rows: 2 Data size: 788 Basic stats: 
> COMPLETE Column stats: NONE
>                   TableScan Vectorization:
>                       native: true
>                       projectedOutputColumns: [0, 1, 2, 3, 4, 5]
>                   Filter Operator
>                     Filter Vectorization:
>                         className: VectorFilterOperator
>                         native: true
>                         predicateExpression: FilterExprAndExpr(children: 
> FilterTimestampScalarEqualTimestampColumn(val 2001-01-01 01:02:03.0, col 
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) 
> -> 6:timestamp) -> boolean, FilterTimestampScalarNotEqualTimestampColumn(val 
> 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col 
> 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampScalarLessEqualTimestampColumn(val 2001-01-01 01:02:03.0, col 
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) 
> -> 6:timestamp) -> boolean, FilterTimestampScalarLessTimestampColumn(val 
> 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col 
> 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampScalarGreaterEqualTimestampColumn(val 2001-01-01 01:02:03.0, 
> col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampScalarGreaterTimestampColumn(val 2001-01-01 01:02:03.0, col 
> 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 
> 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColEqualTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColNotEqualTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 
> 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColGreaterEqualTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColGreaterTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 
> 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColLessEqualTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColLessTimestampScalar(col 6, val 2001-01-01 
> 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 
> 01:02:04.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColEqualTimestampColumn(col 0, col 6)(children: 
> DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) -> 
> 6:timestamp) -> boolean, FilterTimestampColNotEqualTimestampColumn(col 0, col 
> 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) 
> -> 6:timestamp) -> boolean, FilterTimestampColLessEqualTimestampColumn(col 0, 
> col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColLessTimestampColumn(col 0, col 6)(children: 
> DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> 
> 6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampColumn(col 0, 
> col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 
> 01:02:03.000000000) -> 6:timestamp) -> boolean, 
> FilterTimestampColGreaterTimestampColumn(col 0, col 6)(children: 
> DateColSubtractIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> 
> 6:timestamp) -> boolean) -> boolean
>                     predicate: ((2001-01-01 01:02:03.0 = (dt + 0 
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 <> (dt + 0 
> 01:02:04.000000000)) and (2001-01-01 01:02:03.0 <= (dt + 0 
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 < (dt + 0 
> 01:02:04.000000000)) and (2001-01-01 01:02:03.0 >= (dt - 0 
> 01:02:03.000000000)) and (2001-01-01 01:02:03.0 > (dt - 0 
> 01:02:04.000000000)) and ((dt + 0 01:02:03.000000000) = 2001-01-01 
> 01:02:03.0) and ((dt + 0 01:02:04.000000000) <> 2001-01-01 01:02:03.0) and 
> ((dt + 0 01:02:03.000000000) >= 2001-01-01 01:02:03.0) and ((dt + 0 
> 01:02:04.000000000) > 2001-01-01 01:02:03.0) and ((dt - 0 01:02:03.000000000) 
> <= 2001-01-01 01:02:03.0) and ((dt - 0 01:02:04.000000000) < 2001-01-01 
> 01:02:03.0) and (ts = (dt + 0 01:02:03.000000000)) and (ts <> (dt + 0 
> 01:02:04.000000000)) and (ts <= (dt + 0 01:02:03.000000000)) and (ts < (dt + 
> 0 01:02:04.000000000)) and (ts >= (dt - 0 01:02:03.000000000)) and (ts > (dt 
> - 0 01:02:04.000000000))) (type: boolean)
>                     Statistics: Num rows: 1 Data size: 394 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: ts (type: timestamp)
>                       outputColumnNames: _col0
>                       Select Vectorization:
>                           className: VectorSelectOperator
>                           native: true
>                           projectedOutputColumns: [0]
>                       Statistics: Num rows: 1 Data size: 394 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: timestamp)
>                         sort order: +
>                         Reduce Sink Vectorization:
>                             className: VectorReduceSinkOperator
>                             native: false
>                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, 
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
>                             nativeConditionsNotMet: Uniform Hash IS false
>                         Statistics: Num rows: 1 Data size: 394 Basic stats: 
> COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
> ... 
> {code}
> The standard @Explain Annotation Type is used.  A new 'vectorization' 
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

Reply via email to