[jira] [Created] (HIVE-22489) Reduce Sink orders nulls first

Krisztian Kasa (Jira) Wed, 13 Nov 2019 05:48:20 -0800

Krisztian Kasa created HIVE-22489:
-------------------------------------

             Summary: Reduce Sink orders nulls first
                 Key: HIVE-22489
                 URL: https://issues.apache.org/jira/browse/HIVE-22489
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa



When the property hive.default.nulls.last is set to true and no null order is 
explicitly specified in the ORDER BY clause of the query null ordering should 
be NULLS LAST.
But some of the Reduce Sink operators still orders null first.
{code}
SET hive.default.nulls.last=true;

EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key LIMIT 5;
{code}

{code}
PREHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
PREHOOK: type: QUERY
PREHOOK: Input: default@src
#### A masked pattern was here ####
POSTHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
FROM (SELECT `key`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t0`
INNER JOIN (SELECT `key`, `value`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
ORDER BY `t0`.`key`
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: src1
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
                  GatherStats: false
                  Filter Operator
                    isSamplingPred: false
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: key (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
                        tag: 0
                        auto parallelism: true
            Execution mode: vectorized, llap
            LLAP IO: no inputs
            Path -> Alias:
#### A masked pattern was here ####
            Path -> Partition:
#### A masked pattern was here ####
                Partition
                  base file name: src
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                    bucket_count -1
                    bucketing_version 2
                    column.name.delimiter ,
                    columns key,value
                    columns.comments 'default','default'
                    columns.types string:string
#### A masked pattern was here ####
                    name default.src
                    numFiles 1
                    numRows 500
                    rawDataSize 5312
                    serialization.ddl struct src { string key, string value}
                    serialization.format 1
                    serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 5812
#### A masked pattern was here ####
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                      bucket_count -1
                      bucketing_version 2
                      column.name.delimiter ,
                      columns key,value
                      columns.comments 'default','default'
                      columns.types string:string
#### A masked pattern was here ####
                      name default.src
                      numFiles 1
                      numRows 500
                      rawDataSize 5312
                      serialization.ddl struct src { string key, string value}
                      serialization.format 1
                      serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 5812
#### A masked pattern was here ####
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.src
                  name: default.src
            Truncated Path -> Alias:
              /src [src1]
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: src2
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
                  GatherStats: false
                  Filter Operator
                    isSamplingPred: false
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: key (type: string), value (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
                        tag: 1
                        value expressions: _col1 (type: string)
                        auto parallelism: true
            Execution mode: vectorized, llap
            LLAP IO: no inputs
            Path -> Alias:
#### A masked pattern was here ####
            Path -> Partition:
#### A masked pattern was here ####
                Partition
                  base file name: src
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                    bucket_count -1
                    bucketing_version 2
                    column.name.delimiter ,
                    columns key,value
                    columns.comments 'default','default'
                    columns.types string:string
#### A masked pattern was here ####
                    name default.src
                    numFiles 1
                    numRows 500
                    rawDataSize 5312
                    serialization.ddl struct src { string key, string value}
                    serialization.format 1
                    serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 5812
#### A masked pattern was here ####
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                      bucket_count -1
                      bucketing_version 2
                      column.name.delimiter ,
                      columns key,value
                      columns.comments 'default','default'
                      columns.types string:string
#### A masked pattern was here ####
                      name default.src
                      numFiles 1
                      numRows 500
                      rawDataSize 5312
                      serialization.ddl struct src { string key, string value}
                      serialization.format 1
                      serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 5812
#### A masked pattern was here ####
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.src
                  name: default.src
            Truncated Path -> Alias:
              /src [src2]
        Reducer 2 
            Execution mode: llap
            Needs Tagging: false
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                outputColumnNames: _col0, _col2
                Position of Big Table: 1
                Statistics: Num rows: 791 Data size: 140798 Basic stats: 
COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: string), _col2 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 791 Data size: 140798 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: string)
                    null sort order: z
                    sort order: +
                    Statistics: Num rows: 791 Data size: 140798 Basic stats: 
COMPLETE Column stats: COMPLETE
                    tag: -1
                    value expressions: _col1 (type: string)
                    auto parallelism: false
        Reducer 3 
            Execution mode: vectorized, llap
            Needs Tagging: false
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 
(type: string)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 791 Data size: 140798 Basic stats: 
COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
#### A masked pattern was here ####
                  NumFilesPerFileSink: 1
                  Statistics: Num rows: 791 Data size: 140798 Basic stats: 
COMPLETE Column stats: COMPLETE
#### A masked pattern was here ####
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      properties:
                        columns _col0,_col1
                        columns.types string:string
                        escape.delim \
                        hive.serialization.extend.additional.nesting.levels true
                        serialization.escape.crlf true
                        serialization.format 1
                        serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  TotalFiles: 1
                  GatherStats: false
                  MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22489) Reduce Sink orders nulls first

Reply via email to