[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map

Jerry Chen (JIRA) Mon, 10 Dec 2012 19:31:28 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528641#comment-13528641
 ]


Jerry Chen commented on MAPREDUCE-4868:
---------------------------------------

Radim, thank you very much your quick response. I checked the ChainMapper and 
it showed to be not quite the same thing as here. The ChainMapper actually 
iterate the map data only once, and for each key value, it goes through the 
chain of mappers. But the difference here is it will enable the mapper to run 
multiple iterations. At the first glance, it seems to make no sense. But 
considering the parameter data needed (not the input data) for each iteration, 
it makes sense when considering the availability of the parameter data for each 
iteration.

In the Hive optimization problem I mentioned above, the parameter data may not 
be able to fit in the memory and we need partition the data and load in the 
memory and goes through mutiple times over the input data for each partition. 
This saves the complex reduce stage.

Does this makes sense, or there are other way around which provide equivalent 
performance?

Thanks again.
                
> Allow multiple iteration for map
> --------------------------------
>
>                 Key: MAPREDUCE-4868
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 3.0.0, 2.0.3-alpha
>            Reporter: Jerry Chen
>             Fix For: 3.0.0, 2.0.3-alpha
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the Mapper class allows advanced users to override "public void 
> run(Context context)" method for more control over the execution of the 
> mapper, while Context interface limit the operations over the data which is 
> the foundation of "more control".
> One of use cases is that when I am considering a hive optimziation problem, I 
> want to go two passes over the input data instead of using a another job or 
> task ( which may slower the whole process). Each pass do the same thing but 
> with a different parameters.
> This is a new paradigm of Map Reduce usage and can be archived easily by 
> extend Context interface a little with the more control over the data such as 
> reset the input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map

Reply via email to