This is an automated email from the ASF dual-hosted git repository.
gopalv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git
The following commit(s) were added to refs/heads/master by this push:
new b9b1271 HIVE-22214: Explain vectorization should disable user level
explain (Addendum)
b9b1271 is described below
commit b9b12715afef55bfa43992313652aeb03aaedf3f
Author: Gopal V <[email protected]>
AuthorDate: Tue Sep 24 08:05:52 2019 -0700
HIVE-22214: Explain vectorization should disable user level explain
(Addendum)
---
.../test/results/clientpositive/tez/topnkey.q.out | 132 +++--
.../tez/vector_join_part_col_char.q.out | 139 +++--
.../clientpositive/tez/vector_topnkey.q.out | 619 +++++++++++++++++----
3 files changed, 725 insertions(+), 165 deletions(-)
diff --git a/ql/src/test/results/clientpositive/tez/topnkey.q.out
b/ql/src/test/results/clientpositive/tez/topnkey.q.out
index e786c39..3267f79 100644
--- a/ql/src/test/results/clientpositive/tez/topnkey.q.out
+++ b/ql/src/test/results/clientpositive/tez/topnkey.q.out
@@ -118,46 +118,102 @@ SELECT src1.key, src2.value FROM src src1 JOIN src src2
ON (src1.key = src2.key)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
POSTHOOK: Output: hdfs://### HDFS PATH ###
-Plan optimized by CBO.
+PLAN VECTORIZATION:
+ enabled: false
+ enabledConditionsNotMet: [hive.vectorized.execution.enabled IS false]
-Vertex dependency in root stage
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
-Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
-Stage-0
- Fetch Operator
- limit:5
- Stage-1
- Reducer 3
- File Output Operator [FS_13]
- Limit [LIM_12] (rows=5 width=178)
- Number of rows:5
- Select Operator [SEL_11] (rows=791 width=178)
- Output:["_col0","_col1"]
- <-Reducer 2 [SIMPLE_EDGE]
- SHUFFLE [RS_10]
- Select Operator [SEL_9] (rows=791 width=178)
- Output:["_col0","_col1"]
- Merge Join Operator [MERGEJOIN_28] (rows=791 width=178)
- Conds:RS_6._col0=RS_7._col0(Inner),Output:["_col0","_col2"]
- <-Map 1 [SIMPLE_EDGE]
- SHUFFLE [RS_6]
- PartitionCols:_col0
- Select Operator [SEL_2] (rows=500 width=87)
- Output:["_col0"]
- Filter Operator [FIL_16] (rows=500 width=87)
- predicate:key is not null
- TableScan [TS_0] (rows=500 width=87)
-
default@src,src1,Tbl:COMPLETE,Col:COMPLETE,Output:["key"]
- <-Map 4 [SIMPLE_EDGE]
- SHUFFLE [RS_7]
- PartitionCols:_col0
- Select Operator [SEL_5] (rows=500 width=178)
- Output:["_col0","_col1"]
- Filter Operator [FIL_17] (rows=500 width=178)
- predicate:key is not null
- TableScan [TS_3] (rows=500 width=178)
-
default@src,src2,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: src1
+ filterExpr: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: key (type: string)
+ outputColumnNames: _col0
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Statistics: Num rows: 500 Data size: 43500 Basic
stats: COMPLETE Column stats: COMPLETE
+ Map 4
+ Map Operator Tree:
+ TableScan
+ alias: src2
+ filterExpr: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: key (type: string), value (type: string)
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Statistics: Num rows: 500 Data size: 89000 Basic
stats: COMPLETE Column stats: COMPLETE
+ value expressions: _col1 (type: string)
+ Reducer 2
+ Reduce Operator Tree:
+ Merge Join Operator
+ condition map:
+ Inner Join 0 to 1
+ keys:
+ 0 _col0 (type: string)
+ 1 _col0 (type: string)
+ outputColumnNames: _col0, _col2
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: string), _col2 (type: string)
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ value expressions: _col1 (type: string)
+ Reducer 3
+ Reduce Operator Tree:
+ Select Operator
+ expressions: KEY.reducesinkkey0 (type: string), VALUE._col0
(type: string)
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Limit
+ Number of rows: 5
+ Statistics: Num rows: 5 Data size: 890 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 5 Data size: 890 Basic stats:
COMPLETE Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: 5
+ Processor Tree:
+ ListSink
PREHOOK: query: SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON
(src1.key = src2.key) ORDER BY src1.key LIMIT 5
PREHOOK: type: QUERY
diff --git
a/ql/src/test/results/clientpositive/tez/vector_join_part_col_char.q.out
b/ql/src/test/results/clientpositive/tez/vector_join_part_col_char.q.out
index 24f7508..aa987f7 100644
--- a/ql/src/test/results/clientpositive/tez/vector_join_part_col_char.q.out
+++ b/ql/src/test/results/clientpositive/tez/vector_join_part_col_char.q.out
@@ -115,39 +115,116 @@ POSTHOOK: Input: default@char_tbl2
POSTHOOK: Input: default@char_tbl2@gpa=3
POSTHOOK: Input: default@char_tbl2@gpa=3.5
POSTHOOK: Output: hdfs://### HDFS PATH ###
-Plan optimized by CBO.
+PLAN VECTORIZATION:
+ enabled: true
+ enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
-Vertex dependency in root stage
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
-Stage-0
- Fetch Operator
- limit:-1
- Stage-1
- Reducer 2
- File Output Operator [FS_10]
- Merge Join Operator [MERGEJOIN_21] (rows=2 width=429)
-
Conds:RS_23._col2=RS_28._col2(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
- <-Map 1 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_23]
- PartitionCols:_col2
- Select Operator [SEL_22] (rows=2 width=237)
- Output:["_col0","_col1","_col2"]
- TableScan [TS_0] (rows=2 width=237)
-
default@char_tbl1,c1,Tbl:COMPLETE,Col:COMPLETE,Output:["name","age"]
- Dynamic Partitioning Event Operator [EVENT_26] (rows=1 width=99)
- Group By Operator [GBY_25] (rows=1 width=99)
- Output:["_col0"],keys:_col0
- Select Operator [SEL_24] (rows=2 width=237)
- Output:["_col0"]
- Please refer to the previous Select Operator [SEL_22]
- <-Map 3 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_28]
- PartitionCols:_col2
- Select Operator [SEL_27] (rows=2 width=192)
- Output:["_col0","_col1","_col2"]
- TableScan [TS_3] (rows=2 width=192)
-
default@char_tbl2,c2,Tbl:COMPLETE,Col:COMPLETE,Output:["name","age"]
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: c1
+ filterExpr: gpa is not null (type: boolean)
+ Statistics: Num rows: 2 Data size: 474 Basic stats: COMPLETE
Column stats: COMPLETE
+ Select Operator
+ expressions: name (type: string), age (type: int), gpa
(type: char(50))
+ outputColumnNames: _col0, _col1, _col2
+ Statistics: Num rows: 2 Data size: 474 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col2 (type: char(50))
+ sort order: +
+ Map-reduce partition columns: _col2 (type: char(50))
+ Statistics: Num rows: 2 Data size: 474 Basic stats:
COMPLETE Column stats: COMPLETE
+ value expressions: _col0 (type: string), _col1 (type:
int)
+ Select Operator
+ expressions: _col2 (type: char(50))
+ outputColumnNames: _col0
+ Statistics: Num rows: 2 Data size: 474 Basic stats:
COMPLETE Column stats: COMPLETE
+ Group By Operator
+ keys: _col0 (type: char(50))
+ minReductionHashAggr: 0.5
+ mode: hash
+ outputColumnNames: _col0
+ Statistics: Num rows: 1 Data size: 99 Basic stats:
COMPLETE Column stats: COMPLETE
+ Dynamic Partitioning Event Operator
+ Target column: gpa (char(5))
+ Target Input: c2
+ Partition key expr: gpa
+ Statistics: Num rows: 1 Data size: 99 Basic stats:
COMPLETE Column stats: COMPLETE
+ Target Vertex: Map 3
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ Map 3
+ Map Operator Tree:
+ TableScan
+ alias: c2
+ filterExpr: gpa is not null (type: boolean)
+ Statistics: Num rows: 2 Data size: 384 Basic stats: COMPLETE
Column stats: COMPLETE
+ Select Operator
+ expressions: name (type: string), age (type: int), gpa
(type: char(5))
+ outputColumnNames: _col0, _col1, _col2
+ Statistics: Num rows: 2 Data size: 384 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col2 (type: char(50))
+ sort order: +
+ Map-reduce partition columns: _col2 (type: char(50))
+ Statistics: Num rows: 2 Data size: 384 Basic stats:
COMPLETE Column stats: COMPLETE
+ value expressions: _col0 (type: string), _col1 (type:
int)
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+ allNative: true
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ Reducer 2
+ Reduce Operator Tree:
+ Merge Join Operator
+ condition map:
+ Inner Join 0 to 1
+ keys:
+ 0 _col2 (type: char(50))
+ 1 _col2 (type: char(50))
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
+ Statistics: Num rows: 2 Data size: 858 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 2 Data size: 858 Basic stats: COMPLETE
Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+ MergeJoin Vectorization:
+ enabled: false
+ enableConditionsNotMet: Vectorizing MergeJoin Supported IS
false
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
PREHOOK: query: select c1.name, c1.age, c1.gpa, c2.name, c2.age, c2.gpa from
char_tbl1 c1 join char_tbl2 c2 on (c1.gpa = c2.gpa)
PREHOOK: type: QUERY
diff --git a/ql/src/test/results/clientpositive/tez/vector_topnkey.q.out
b/ql/src/test/results/clientpositive/tez/vector_topnkey.q.out
index aecd7c7..b6760db 100644
--- a/ql/src/test/results/clientpositive/tez/vector_topnkey.q.out
+++ b/ql/src/test/results/clientpositive/tez/vector_topnkey.q.out
@@ -8,37 +8,181 @@ SELECT key, SUM(CAST(SUBSTR(value,5) AS INT)) FROM src GROUP
BY key ORDER BY key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
POSTHOOK: Output: hdfs://### HDFS PATH ###
-Plan optimized by CBO.
+PLAN VECTORIZATION:
+ enabled: true
+ enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
-Vertex dependency in root stage
-Reducer 2 <- Map 1 (SIMPLE_EDGE)
-Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
-Stage-0
- Fetch Operator
- limit:5
- Stage-1
- Reducer 3 vectorized
- File Output Operator [FS_20]
- Limit [LIM_19] (rows=5 width=95)
- Number of rows:5
- Select Operator [SEL_18] (rows=250 width=95)
- Output:["_col0","_col1"]
- <-Reducer 2 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_17]
- Group By Operator [GBY_16] (rows=250 width=95)
-
Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0
- <-Map 1 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_15]
- PartitionCols:_col0
- Group By Operator [GBY_14] (rows=250 width=95)
-
Output:["_col0","_col1"],aggregations:["sum(_col1)"],keys:_col0
- Top N Key Operator [TNK_13] (rows=500 width=178)
- keys:_col0,sort order:+,top n:5
- Select Operator [SEL_12] (rows=500 width=178)
- Output:["_col0","_col1"]
- TableScan [TS_0] (rows=500 width=178)
-
default@src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: src
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ TableScan Vectorization:
+ native: true
+ vectorizationSchemaColumns: [0:key:string,
1:value:string, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>]
+ Select Operator
+ expressions: key (type: string),
UDFToInteger(substr(value, 5)) (type: int)
+ outputColumnNames: _col0, _col1
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 4]
+ selectExpressions: CastStringToLong(col
3:string)(children: StringSubstrColStart(col 1:string, start 4) -> 3:string) ->
4:int
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Top N Key Operator
+ sort order: +
+ keys: _col0 (type: string)
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ top n: 5
+ Top N Key Vectorization:
+ className: VectorTopNKeyOperator
+ keyExpressions: col 0:string
+ native: true
+ Group By Operator
+ aggregations: sum(_col1)
+ Group By Vectorization:
+ aggregators: VectorUDAFSumLong(col 4:int) -> bigint
+ className: VectorGroupByOperator
+ groupByMode: HASH
+ keyExpressions: col 0:string
+ native: false
+ vectorProcessingMode: HASH
+ projectedOutputColumnNums: [0]
+ keys: _col0 (type: string)
+ minReductionHashAggr: 0.5
+ mode: hash
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 250 Data size: 23750 Basic
stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkStringOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ valueColumns: 1:bigint
+ Statistics: Num rows: 250 Data size: 23750 Basic
stats: COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ value expressions: _col1 (type: bigint)
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ includeColumns: [0, 1]
+ dataColumns: key:string, value:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: [string, bigint]
+ Reducer 2
+ Execution mode: vectorized
+ Reduce Vectorization:
+ enabled: true
+ enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
+ reduceColumnNullOrder: z
+ reduceColumnSortOrder: +
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ dataColumns: KEY._col0:string, VALUE._col0:bigint
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reduce Operator Tree:
+ Group By Operator
+ aggregations: sum(VALUE._col0)
+ Group By Vectorization:
+ aggregators: VectorUDAFSumLong(col 1:bigint) -> bigint
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ keyExpressions: col 0:string
+ native: false
+ vectorProcessingMode: MERGE_PARTIAL
+ projectedOutputColumnNums: [0]
+ keys: KEY._col0 (type: string)
+ mode: mergepartial
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 250 Data size: 23750 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkObjectHashOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ valueColumns: 1:bigint
+ Statistics: Num rows: 250 Data size: 23750 Basic stats:
COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ value expressions: _col1 (type: bigint)
+ Reducer 3
+ Execution mode: vectorized
+ Reduce Vectorization:
+ enabled: true
+ enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
+ reduceColumnNullOrder: z
+ reduceColumnSortOrder: +
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ dataColumns: KEY.reducesinkkey0:string, VALUE._col0:bigint
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reduce Operator Tree:
+ Select Operator
+ expressions: KEY.reducesinkkey0 (type: string), VALUE._col0
(type: bigint)
+ outputColumnNames: _col0, _col1
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 1]
+ Statistics: Num rows: 250 Data size: 23750 Basic stats:
COMPLETE Column stats: COMPLETE
+ Limit
+ Number of rows: 5
+ Limit Vectorization:
+ className: VectorLimitOperator
+ native: true
+ Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 5 Data size: 475 Basic stats:
COMPLETE Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: 5
+ Processor Tree:
+ ListSink
PREHOOK: query: SELECT key, SUM(CAST(SUBSTR(value,5) AS INT)) FROM src GROUP
BY key ORDER BY key LIMIT 5
PREHOOK: type: QUERY
@@ -63,37 +207,172 @@ SELECT key FROM src GROUP BY key ORDER BY key LIMIT 5
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
POSTHOOK: Output: hdfs://### HDFS PATH ###
-Plan optimized by CBO.
+PLAN VECTORIZATION:
+ enabled: true
+ enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
+
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
-Vertex dependency in root stage
-Reducer 2 <- Map 1 (SIMPLE_EDGE)
-Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: src
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ TableScan Vectorization:
+ native: true
+ vectorizationSchemaColumns: [0:key:string,
1:value:string, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>]
+ Select Operator
+ expressions: key (type: string)
+ outputColumnNames: key
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0]
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Top N Key Operator
+ sort order: +
+ keys: key (type: string)
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ top n: 5
+ Top N Key Vectorization:
+ className: VectorTopNKeyOperator
+ keyExpressions: col 0:string
+ native: true
+ Group By Operator
+ Group By Vectorization:
+ className: VectorGroupByOperator
+ groupByMode: HASH
+ keyExpressions: col 0:string
+ native: false
+ vectorProcessingMode: HASH
+ projectedOutputColumnNums: []
+ keys: key (type: string)
+ minReductionHashAggr: 0.5
+ mode: hash
+ outputColumnNames: _col0
+ Statistics: Num rows: 250 Data size: 21750 Basic
stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkStringOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ Statistics: Num rows: 250 Data size: 21750 Basic
stats: COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ includeColumns: [0]
+ dataColumns: key:string, value:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reducer 2
+ Execution mode: vectorized
+ Reduce Vectorization:
+ enabled: true
+ enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
+ reduceColumnNullOrder: z
+ reduceColumnSortOrder: +
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 1
+ dataColumns: KEY._col0:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reduce Operator Tree:
+ Group By Operator
+ Group By Vectorization:
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ keyExpressions: col 0:string
+ native: false
+ vectorProcessingMode: MERGE_PARTIAL
+ projectedOutputColumnNums: []
+ keys: KEY._col0 (type: string)
+ mode: mergepartial
+ outputColumnNames: _col0
+ Statistics: Num rows: 250 Data size: 21750 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkObjectHashOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ Statistics: Num rows: 250 Data size: 21750 Basic stats:
COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ Reducer 3
+ Execution mode: vectorized
+ Reduce Vectorization:
+ enabled: true
+ enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
+ reduceColumnNullOrder: z
+ reduceColumnSortOrder: +
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 1
+ dataColumns: KEY.reducesinkkey0:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reduce Operator Tree:
+ Select Operator
+ expressions: KEY.reducesinkkey0 (type: string)
+ outputColumnNames: _col0
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0]
+ Statistics: Num rows: 250 Data size: 21750 Basic stats:
COMPLETE Column stats: COMPLETE
+ Limit
+ Number of rows: 5
+ Limit Vectorization:
+ className: VectorLimitOperator
+ native: true
+ Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 5 Data size: 435 Basic stats:
COMPLETE Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-Stage-0
- Fetch Operator
- limit:5
- Stage-1
- Reducer 3 vectorized
- File Output Operator [FS_19]
- Limit [LIM_18] (rows=5 width=87)
- Number of rows:5
- Select Operator [SEL_17] (rows=250 width=87)
- Output:["_col0"]
- <-Reducer 2 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_16]
- Group By Operator [GBY_15] (rows=250 width=87)
- Output:["_col0"],keys:KEY._col0
- <-Map 1 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_14]
- PartitionCols:_col0
- Group By Operator [GBY_13] (rows=250 width=87)
- Output:["_col0"],keys:key
- Top N Key Operator [TNK_12] (rows=500 width=87)
- keys:key,sort order:+,top n:5
- Select Operator [SEL_11] (rows=500 width=87)
- Output:["key"]
- TableScan [TS_0] (rows=500 width=87)
-
default@src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key"]
+ Stage: Stage-0
+ Fetch Operator
+ limit: 5
+ Processor Tree:
+ ListSink
PREHOOK: query: SELECT key FROM src GROUP BY key ORDER BY key LIMIT 5
PREHOOK: type: QUERY
@@ -118,46 +397,194 @@ SELECT src1.key, src2.value FROM src src1 JOIN src src2
ON (src1.key = src2.key)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
POSTHOOK: Output: hdfs://### HDFS PATH ###
-Plan optimized by CBO.
+PLAN VECTORIZATION:
+ enabled: true
+ enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
+
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
-Vertex dependency in root stage
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
-Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: src1
+ filterExpr: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ TableScan Vectorization:
+ native: true
+ vectorizationSchemaColumns: [0:key:string,
1:value:string, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>]
+ Filter Operator
+ Filter Vectorization:
+ className: VectorFilterOperator
+ native: true
+ predicateExpression: SelectColumnIsNotNull(col
0:string)
+ predicate: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: key (type: string)
+ outputColumnNames: _col0
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0]
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkStringOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ Statistics: Num rows: 500 Data size: 43500 Basic
stats: COMPLETE Column stats: COMPLETE
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
+ allNative: true
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ includeColumns: [0]
+ dataColumns: key:string, value:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Map 4
+ Map Operator Tree:
+ TableScan
+ alias: src2
+ filterExpr: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ TableScan Vectorization:
+ native: true
+ vectorizationSchemaColumns: [0:key:string,
1:value:string, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>]
+ Filter Operator
+ Filter Vectorization:
+ className: VectorFilterOperator
+ native: true
+ predicateExpression: SelectColumnIsNotNull(col
0:string)
+ predicate: key is not null (type: boolean)
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: key (type: string), value (type: string)
+ outputColumnNames: _col0, _col1
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 1]
+ Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Reduce Sink Vectorization:
+ className: VectorReduceSinkStringOperator
+ keyColumns: 0:string
+ native: true
+ nativeConditionsMet:
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine
tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true,
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
+ valueColumns: 1:string
+ Statistics: Num rows: 500 Data size: 89000 Basic
stats: COMPLETE Column stats: COMPLETE
+ value expressions: _col1 (type: string)
+ Execution mode: vectorized
+ Map Vectorization:
+ enabled: true
+ enabledConditionsMet:
hive.vectorized.use.vector.serde.deserialize IS true
+ inputFormatFeatureSupport: [DECIMAL_64]
+ featureSupportInUse: [DECIMAL_64]
+ inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
+ allNative: true
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ includeColumns: [0, 1]
+ dataColumns: key:string, value:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reducer 2
+ Reduce Operator Tree:
+ Merge Join Operator
+ condition map:
+ Inner Join 0 to 1
+ keys:
+ 0 _col0 (type: string)
+ 1 _col0 (type: string)
+ outputColumnNames: _col0, _col2
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: string), _col2 (type: string)
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ sort order: +
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ TopN Hash Memory Usage: 0.1
+ value expressions: _col1 (type: string)
+ MergeJoin Vectorization:
+ enabled: false
+ enableConditionsNotMet: Vectorizing MergeJoin Supported IS
false
+ Reducer 3
+ Execution mode: vectorized
+ Reduce Vectorization:
+ enabled: true
+ enableConditionsMet: hive.vectorized.execution.reduce.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true
+ reduceColumnNullOrder: z
+ reduceColumnSortOrder: +
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 2
+ dataColumns: KEY.reducesinkkey0:string, VALUE._col0:string
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
+ Reduce Operator Tree:
+ Select Operator
+ expressions: KEY.reducesinkkey0 (type: string), VALUE._col0
(type: string)
+ outputColumnNames: _col0, _col1
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 1]
+ Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
+ Limit
+ Number of rows: 5
+ Limit Vectorization:
+ className: VectorLimitOperator
+ native: true
+ Statistics: Num rows: 5 Data size: 890 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 5 Data size: 890 Basic stats:
COMPLETE Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-Stage-0
- Fetch Operator
- limit:5
- Stage-1
- Reducer 3 vectorized
- File Output Operator [FS_37]
- Limit [LIM_36] (rows=5 width=178)
- Number of rows:5
- Select Operator [SEL_35] (rows=791 width=178)
- Output:["_col0","_col1"]
- <-Reducer 2 [SIMPLE_EDGE]
- SHUFFLE [RS_10]
- Select Operator [SEL_9] (rows=791 width=178)
- Output:["_col0","_col1"]
- Merge Join Operator [MERGEJOIN_28] (rows=791 width=178)
- Conds:RS_31._col0=RS_34._col0(Inner),Output:["_col0","_col2"]
- <-Map 1 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_31]
- PartitionCols:_col0
- Select Operator [SEL_30] (rows=500 width=87)
- Output:["_col0"]
- Filter Operator [FIL_29] (rows=500 width=87)
- predicate:key is not null
- TableScan [TS_0] (rows=500 width=87)
-
default@src,src1,Tbl:COMPLETE,Col:COMPLETE,Output:["key"]
- <-Map 4 [SIMPLE_EDGE] vectorized
- SHUFFLE [RS_34]
- PartitionCols:_col0
- Select Operator [SEL_33] (rows=500 width=178)
- Output:["_col0","_col1"]
- Filter Operator [FIL_32] (rows=500 width=178)
- predicate:key is not null
- TableScan [TS_3] (rows=500 width=178)
-
default@src,src2,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]
+ Stage: Stage-0
+ Fetch Operator
+ limit: 5
+ Processor Tree:
+ ListSink
PREHOOK: query: SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON
(src1.key = src2.key) ORDER BY src1.key LIMIT 5
PREHOOK: type: QUERY