[
https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774027#action_12774027
]
Doug Cutting commented on MAPREDUCE-1183:
-----------------------------------------
This would be a nice API for Java.
How would we implement this? Would we serialize these to the splits file? To a
new per-job file? In a parameter to the job-submission RPC?
Long-term it would be nice if job-submissions could be easily made by non-Java
applications. So a job submission might specify a TaskRunner implementation
name, plus one or more blobs that are consumed by that TaskRunner, to implement
map, reduce, partition, inputformat and outputformat, etc. JavaTaskRunner
might use Java serialization to create its blobs, while a PythonTaskRunner and
CTaskRunner might do something else. The TaskRunners would all be implemented
in Java, but would provide the glue for other native MapReduce APIs. If we
agree that this is the sort of long-term architecture we seek, should we add it
now?
> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -----------------------------------------------------------------------------
>
> Key: MAPREDUCE-1183
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: client
> Affects Versions: 0.21.0
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information
> about the various aspects of a job such as Mapper, Reducer, InputFormat,
> OutputFormat, OutputCommitter etc. and application developers use
> org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission
> time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with
> org.apache.hadoop.mapreduce.Job via actual objects which are then serialized
> by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.