[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]

Alejandro Abdelnur (JIRA) Mon, 07 Jul 2008 00:49:57 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610905#action_12610905
 ]


Alejandro Abdelnur commented on HADOOP-3702:
--------------------------------------------

Example of creating and submitting a chain job:

{code:java}

JobConf conf = new JobConf();

// chaining maps in the Map phase

Properties mapAConf = new Properties();
mapAConf.setProperty("a", "A");
ChainMapper.addMapper(conf, AMap.class, mapAConf);

ChainMapper.addMapper(conf, BMap.class, null);

// setting the reducer

Properties reduceConf = new Properties();
ChainReducer.setReducer(conf, XReduce.class, reduceConf);

// chaining maps in the Reduce phase

ChainReducer.addMapper(conf, CMap.class, null);

ChainReducer.addMapper(conf, DMap.class, null);

...

FileInputFormat.setInputPaths(conf, inDir);
FileOutputFormat.setOutputPath(conf, outDir);

JobClient jc = new JobClient(conf);
RunningJob job = jc.submitJob(conf);

{code}

> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>
> On the same input, we usually need to run multiple Maps one after the other 
> without no Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a 
> significant amount of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce 
> phase after the Reduce.
> This could be done with ChainMapper and ChainReducer classes that would 
> manage the chain of Maps and they would override the OutputCollector to 
> implement the chaining.
> The Maps and Reduce that are part of the Chain are unware they are executed 
> in a Chain, they receive records via the {{map}} and {{reduce}} methods and 
> do the output via the {{OutputCollector}}.
> The API would look something like:
> {code:java}
> public class ChainMapper implements Mapper {
>   public static void addMapper(JobConf job, Class<? extends Mapper> klass, 
> Properties mapperConf);
>   ...
> }
> public class ChainReducer implements Reducer {
>   public static void setReducer(JobConf job, Class<? extends Reducer> klass, 
> Properties reducerConf);
>   public static void addMapper(JobConf job, Class<? extends Mapper> klass, 
> Properties mapperConf);
>   ...
> }
> {code}
> The {{Properties}} configuration passed to the {{Mapper}} and {{Reducer}} 
> when setting them into the chain are injected into a copy of the job's 
> configuration. This allows maps to be configured as usual without being aware 
> that they are in a chain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]

Reply via email to

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]