[jira] [Updated] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

Stamatis Zampetakis (Jira) Fri, 21 Oct 2022 00:21:27 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stamatis Zampetakis updated HIVE-7277:
--------------------------------------
    Fix Version/s:     (was: 0.13.0)

I cleared the fixVersion field since this ticket is still open. Please review 
this ticket and if the fix is already committed to a specific version please 
set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA 
guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] 
the fixVersion should be set only when the issue is resolved/closed.

> how to decide reduce numbers   according  to  the input size of reduce stage 
> rather than the  input size of  map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7277
>                 URL: https://issues.apache.org/jira/browse/HIVE-7277
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: WangMeng
>            Priority: Major
>
> As we  know ,now  hive decide the  reduce numbers  just by  the " Input size 
> of   map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I  think  the out put size of map stage  may have a big difference from  
> the original  input size , so I  think  this  strategy to decide 
> reduce-numbers may be improper....
> So is   there any feature  which can decide the reduce number just  according 
> to the out put  of the map stage.?    thanks .  
>  As  I know , actually ,the reduce stage will begin just  after some map 
> tasks have finished rather than until  the  whole map stage have finished , 
> so I  think  it is improper too  decide reduce numbers   when  the  whole map 
> stage  have finished.
> As  someone point ,We can just according to  the out put size of the  
> earliest map tasks which have finished   to  estimate the whole reduce 
> numbers......However,   in fact ,now Hive has used filter push down(where) 
> ,which may  resulting a big  difference from each map task .
> So，  this  estimation  is improper.
> thanks .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

Reply via email to