Krisztian Kasa created HIVE-22489:
-------------------------------------
Summary: Reduce Sink orders nulls first
Key: HIVE-22489
URL: https://issues.apache.org/jira/browse/HIVE-22489
Project: Hive
Issue Type: Bug
Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
When the property hive.default.nulls.last is set to true and no null order is
explicitly specified in the ORDER BY clause of the query null ordering should
be NULLS LAST.
But some of the Reduce Sink operators still orders null first.
{code}
SET hive.default.nulls.last=true;
EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key =
src2.key) ORDER BY src1.key LIMIT 5;
{code}
{code}
PREHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key =
src2.key) ORDER BY src1.key
PREHOOK: type: QUERY
PREHOOK: Input: default@src
#### A masked pattern was here ####
POSTHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key =
src2.key) ORDER BY src1.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
FROM (SELECT `key`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t0`
INNER JOIN (SELECT `key`, `value`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
ORDER BY `t0`.`key`
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: src1
filterExpr: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
GatherStats: false
Filter Operator
isSamplingPred: false
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
tag: 0
auto parallelism: true
Execution mode: vectorized, llap
LLAP IO: no inputs
Path -> Alias:
#### A masked pattern was here ####
Path -> Partition:
#### A masked pattern was here ####
Partition
base file name: src
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
#### A masked pattern was here ####
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
#### A masked pattern was here ####
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
#### A masked pattern was here ####
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
#### A masked pattern was here ####
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.src
name: default.src
Truncated Path -> Alias:
/src [src1]
Map 4
Map Operator Tree:
TableScan
alias: src2
filterExpr: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
GatherStats: false
Filter Operator
isSamplingPred: false
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
tag: 1
value expressions: _col1 (type: string)
auto parallelism: true
Execution mode: vectorized, llap
LLAP IO: no inputs
Path -> Alias:
#### A masked pattern was here ####
Path -> Partition:
#### A masked pattern was here ####
Partition
base file name: src
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
#### A masked pattern was here ####
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
#### A masked pattern was here ####
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
#### A masked pattern was here ####
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
#### A masked pattern was here ####
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.src
name: default.src
Truncated Path -> Alias:
/src [src2]
Reducer 2
Execution mode: llap
Needs Tagging: false
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col2
Position of Big Table: 1
Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: string), _col2 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
tag: -1
value expressions: _col1 (type: string)
auto parallelism: false
Reducer 3
Execution mode: vectorized, llap
Needs Tagging: false
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: string), VALUE._col0
(type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
GlobalTableId: 0
#### A masked pattern was here ####
NumFilesPerFileSink: 1
Statistics: Num rows: 791 Data size: 140798 Basic stats:
COMPLETE Column stats: COMPLETE
#### A masked pattern was here ####
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
columns _col0,_col1
columns.types string:string
escape.delim \
hive.serialization.extend.additional.nesting.levels true
serialization.escape.crlf true
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)