[
https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041741#comment-14041741
]
wangmeng commented on HIVE-7277:
--------------------------------
As I know ,TEZ is a new compute engine different from mapreduce, is there
any solution based on map reduce engine ?
> how to decide reduce numbers according to the input size of reduce stage
> rather than the input size of map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-7277
> URL: https://issues.apache.org/jira/browse/HIVE-7277
> Project: Hive
> Issue Type: New Feature
> Reporter: wangmeng
> Fix For: 0.13.0
>
>
> As we know ,now hive decide the reduce numbers just by the " Input size
> of map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I think the out put size of map stage may have a big difference from
> the original input size , so I think this strategy to decide
> reduce-numbers may be improper....
> So is there any feature which can decide the reduce number just according
> to the out put of the map stage.? thanks .
> As I know , actually ,the reduce stage will begin just after some map
> tasks have finished rather than until the whole map stage have finished ,
> so I think it is improper too decide reduce numbers when the whole map
> stage have finished.
> As someone point ,We can just according to the out put size of the
> earliest map tasks which have finished to estimate the whole reduce
> numbers......However, in fact ,now Hive has used filter push down(where)
> ,which may resulting a big difference from each map task .
> So, this estimation is improper.
> thanks .
--
This message was sent by Atlassian JIRA
(v6.2#6252)