[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al

Doug Cutting (JIRA) Thu, 05 Nov 2009 10:48:56 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774027#action_12774027
 ]


Doug Cutting commented on MAPREDUCE-1183:
-----------------------------------------

This would be a nice API for Java.

How would we implement this?  Would we serialize these to the splits file? To a 
new per-job file?  In a parameter to the job-submission RPC?

Long-term it would be nice if job-submissions could be easily made by non-Java 
applications.  So a job submission might specify a TaskRunner implementation 
name, plus one or more blobs that are consumed by that TaskRunner, to implement 
map, reduce, partition, inputformat and outputformat, etc.  JavaTaskRunner 
might use Java serialization to create its blobs, while a PythonTaskRunner and 
CTaskRunner might do something else.  The TaskRunners would all be implemented 
in Java, but would provide the glue for other native MapReduce APIs.  If we 
agree that this is the sort of long-term architecture we seek, should we add it 
now?


> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1183
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information 
> about the various aspects of a job such as Mapper, Reducer, InputFormat, 
> OutputFormat, OutputCommitter etc. and application developers use 
> org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission 
> time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with 
> org.apache.hadoop.mapreduce.Job via actual objects which are then serialized 
> by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al

Reply via email to