[
https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732456#action_12732456
]
Amareshwari Sriramadasu commented on MAPREDUCE-372:
---------------------------------------------------
As per my understanding "In ChainMapper, the Mapper classes are invoked in a
chained (or piped) fashion, the output of the first becomes the input of the
second, and so on until the last Mapper, the output of the last Mapper will be
written to the task's output." This is mainly to reduce disk IO.
With new api interface I see an issue to achieve similar functionality.
New api Mapper interface looks like the following:
{code}
protected void setup(Context context);
protected void map(KEYIN key, VALUEIN value, Context context);
protected void cleanup(Context context);
public void run(Context context);
{code}
If we want to chain mappers, we have to chain them in run method(), since run()
is the only public method. But Mapper.run() is going to run map on all
(key,value) pairs. Then Chaining would mean running different map only jobs.
One solution I could see is :
1. Make setup(), map() and cleanup() methods public.
2. Do chaining at map(). But User's Mapper.run() implementation is not
considered.
Thoughts?
> Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-372
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.