[jira] [Assigned] (HIVE-28729) Apply nulls order setting in Reduce Sink operator of join branches

Krisztian Kasa (Jira) Thu, 30 Jan 2025 06:05:22 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krisztian Kasa reassigned HIVE-28729:
-------------------------------------

    Assignee: Krisztian Kasa

> Apply nulls order setting in Reduce Sink operator of join branches
> ------------------------------------------------------------------
>
>                 Key: HIVE-28729
>                 URL: https://issues.apache.org/jira/browse/HIVE-28729
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>
> {code:java}
> set hive.default.nulls.last=false;
> create table t1(key int, value string);
> EXPLAIN SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM t1 a INNER JOIN t1 
> b on a.key = b.key;
> {code}
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: key is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: key (type: int), value (type: string)
>                       outputColumnNames: key, value
>                       Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: key (type: int)
>                         null sort order: z
>                         sort order: +
>                         Map-reduce partition columns: key (type: int)
>                         Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: value (type: string)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Map 4 
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: key is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: key (type: int), value (type: string)
>                       outputColumnNames: key, value
>                       Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: key (type: int)
>                         null sort order: z
>                         sort order: +
>                         Map-reduce partition columns: key (type: int)
>                         Statistics: Num rows: 1 Data size: 188 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: value (type: string)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2 
>             Execution mode: llap
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 key (type: int)
>                   1 key (type: int)
>                 outputColumnNames: key, value, key0, value0
>                 Statistics: Num rows: 1 Data size: 206 Basic stats: COMPLETE 
> Column stats: NONE
>                 Select Operator
>                   expressions: hash(key,value,key0,value0) (type: int)
>                   outputColumnNames: $f0
>                   Statistics: Num rows: 1 Data size: 206 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     aggregations: sum($f0)
>                     minReductionHashAggr: 0.99
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       null sort order: 
>                       sort order: 
>                       Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: NONE
>                       value expressions: _col0 (type: bigint)
>         Reducer 3 
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: $f0
>                 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Nulls order in RS operators are NULLS LAST but is should be NULLS FIRST 
> because of the config {{hive.default.nulls.last=false}}
> {code}
>         Map 1 
>             Map Operator Tree:
>             ...
>                        Reduce Output Operator
>                         key expressions: key (type: int)
>                         null sort order: z
>             ...
> {code}
> {code}
>         Map 4 
>             Map Operator Tree:
>             ...
>                       Reduce Output Operator
>                         key expressions: key (type: int)
>                         null sort order: z
>             ...
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-28729) Apply nulls order setting in Reduce Sink operator of join branches

Reply via email to