Steve Loughran created MAPREDUCE-6956:
-----------------------------------------

             Summary: FileOutputCommitter to gain abstract superclass 
PathOutputCommitter
                 Key: MAPREDUCE-6956
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6956
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 3.0.0-beta1
            Reporter: Steve Loughran
            Assignee: Steve Loughran


This is the initial step of MAPREDUCE-6823, which proposes a factory behind 
{{FileOutputFormat}} to create different committers for different filesystems, 
if so configured..

This patch simply adds the new abstract superclass of {{FileOutputCommitter}}, 
{{PathOutputCommitter extends OutputCommitter}}. This abstract class adds the 
{{getWorkPath()}} method as an abstract method, with {{FIleOutputCommitter}} 
being the implementation..

{{FileOutputFormat}} then relaxes its requirement of any committer returned by 
{{getOutputCommitter()}}, so that instead of requiring a  
{{FileOutputCommitter}} or subclass, it only needs a {{PathOutputCommitter}}, 
using {{PathOutputCommitter.getWorkPath()}} to get the work path.

What does that do?

It allows people to implement subclasses of {{FileOutputFormat}} which can 
provide their own committers *which don't need to inherit the complexity that 
FileOutputCommitter has acquired over time*

Currently anyone implementing a new committer (example: Netflix S3 committer) 
needs to subclass {{FileOutputCommitter}}, which is too complex to understand 
except under a debugger with co-recursive routines, lots of methods which need 
to be overwritten to guarantee a safe subclass, and, because of its critical 
role and known subclassing, something which isn't ever going to be cleaned up.

A new, lean, parent class which {{FileOutputFormat}} can handle allows people 
to write new committers which don't have to worry about implementation details 
of {{FileOutputCommitter}}, but instead how well they implement the semantics 
of committing work.

The full MAPREDUCE-6823 goes beyond this with a change to {{FileOutputFormat}} 
for a factory for creating FS-specific {{PathOutputCommitter}} instances. This 
patch doesn't include that, as that is something which needs to be reviewed in 
the context of HADOOP-13786 and ideally 1+ committer for another store, so 
people can say "this factory model works".

All I'm proposing here is: tune the committer class hierarchy in MRv2 so that 
people can more easily implement committers, and when that factory is done, for 
it to be switched to easily. And I'd like this in branch-3 from the outset, so 
existing code which calls {{FileOutputFormat.getCommitter()}} to get a 
{{FileOutputCommitter}} *just to call getWorkPath()* can move to the new 
interface across all of Hadoop 3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to