Gopal V created HIVE-16852:
------------------------------
Summary: PTF: RANK() re-evaluates order predicates on the reducer
Key: HIVE-16852
URL: https://issues.apache.org/jira/browse/HIVE-16852
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 2.1.1, 3.0.0
Reporter: Gopal V
{code}
explain select ss_item_sk, rank() over(order by cast(ss_list_price as
decimal(38,10))) as r , ss_list_price from store_sales;
STAGE PLANS:
Stage: Stage-1
Tez
DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
Statistics: Num rows: 28800426268 Data size: 450435120648
Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: 0 (type: int), CAST( ss_list_price AS
decimal(38,10)) (type: decimal(38,10))
sort order: ++
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 28800426268 Data size: 450435120648
Basic stats: COMPLETE Column stats: COMPLETE
value expressions: ss_item_sk (type: bigint), ss_list_price
(type: double)
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Execution mode: llap
Reduce Operator Tree:
Select Operator
expressions: VALUE._col1 (type: bigint), VALUE._col11 (type:
double)
outputColumnNames: _col1, _col11
Statistics: Num rows: 28800426268 Data size: 8399352770616
Basic stats: COMPLETE Column stats: COMPLETE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col1: bigint, _col11: double
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: CAST( _col11 AS decimal(38,10)) ASC NULLS
FIRST
partition by: 0
raw input shape:
window functions:
window function definition
alias: rank_window_0
arguments: CAST( _col11 AS decimal(38,10))
name: rank
window function: GenericUDAFRankEvaluator
window frame: PRECEDING(MAX)~FOLLOWING(MAX)
isPivotResult: true
Statistics: Num rows: 28800426268 Data size: 8399352770616
Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col1 (type: bigint), rank_window_0 (type:
int), _col11 (type: double)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 28800426268 Data size: 565636825720
Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 28800426268 Data size: 565636825720
Basic stats: COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
This forces the Decimal cast to be evaluated ~2x - once to produce the KEY
expression and once within the window function.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)