[
https://issues.apache.org/jira/browse/MAPREDUCE-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated MAPREDUCE-6956:
--------------------------------------
Status: Patch Available (was: Open)
> FileOutputCommitter to gain abstract superclass PathOutputCommitter
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-6956
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6956
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2
> Affects Versions: 3.0.0-beta1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: MAPREDUCE-6956-001.patch
>
>
> This is the initial step of MAPREDUCE-6823, which proposes a factory behind
> {{FileOutputFormat}} to create different committers for different
> filesystems, if so configured..
> This patch simply adds the new abstract superclass of
> {{FileOutputCommitter}}, {{PathOutputCommitter extends OutputCommitter}}.
> This abstract class adds the {{getWorkPath()}} method as an abstract method,
> with {{FIleOutputCommitter}} being the implementation..
> {{FileOutputFormat}} then relaxes its requirement of any committer returned
> by {{getOutputCommitter()}}, so that instead of requiring a
> {{FileOutputCommitter}} or subclass, it only needs a {{PathOutputCommitter}},
> using {{PathOutputCommitter.getWorkPath()}} to get the work path.
> What does that do?
> It allows people to implement subclasses of {{FileOutputFormat}} which can
> provide their own committers *which don't need to inherit the complexity that
> FileOutputCommitter has acquired over time*
> Currently anyone implementing a new committer (example: Netflix S3 committer)
> needs to subclass {{FileOutputCommitter}}, which is too complex to understand
> except under a debugger with co-recursive routines, lots of methods which
> need to be overwritten to guarantee a safe subclass, and, because of its
> critical role and known subclassing, something which isn't ever going to be
> cleaned up.
> A new, lean, parent class which {{FileOutputFormat}} can handle allows people
> to write new committers which don't have to worry about implementation
> details of {{FileOutputCommitter}}, but instead how well they implement the
> semantics of committing work.
> The full MAPREDUCE-6823 goes beyond this with a change to
> {{FileOutputFormat}} for a factory for creating FS-specific
> {{PathOutputCommitter}} instances. This patch doesn't include that, as that
> is something which needs to be reviewed in the context of HADOOP-13786 and
> ideally 1+ committer for another store, so people can say "this factory model
> works".
> All I'm proposing here is: tune the committer class hierarchy in MRv2 so that
> people can more easily implement committers, and when that factory is done,
> for it to be switched to easily. And I'd like this in branch-3 from the
> outset, so existing code which calls {{FileOutputFormat.getCommitter()}} to
> get a {{FileOutputCommitter}} *just to call getWorkPath()* can move to the
> new interface across all of Hadoop 3.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]