wangmeng created HIVE-7277:
------------------------------
Summary: how to decide reduce numbers according to the input
size of reduce stage rather than the input size of map stage?
Key: HIVE-7277
URL: https://issues.apache.org/jira/browse/HIVE-7277
Project: Hive
Issue Type: New Feature
Reporter: wangmeng
Fix For: 0.13.0
As we know ,now hive decide the reduce numbers just by the " Input size of
map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
But ,I think the out put size of map stage may have a big difference from
the original input size , so I think this strategy to decide reduce-numbers
may be improper....
So is there any feature which can decide the reduce number just according
to the out put of the map stage.? thanks .
As I know , actually ,the reduce stage will begin just after some map tasks
have finished rather than until the whole map stage have finished , so I
think it is improper too decide reduce numbers when the whole map stage
have finished.
As someone point ,We can just according to the out put size of the earliest
map tasks which have finished to estimate the whole reduce
numbers......However, in fact ,now Hive has used filter push down(where)
,which may resulting a big difference from each map task .
So, this estimation is improper.
thanks .
--
This message was sent by Atlassian JIRA
(v6.2#6252)