[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061332#comment-16061332
 ] 

Steve Loughran commented on MAPREDUCE-6823:
-------------------------------------------


**2017/06/23 update** no, that's just messy. Best to find when those committers 
are used and allow them to be more generic. Example: all the parquet one does 
is add an optional schema summary file. If you don't want that, any FOF 
committer can be used

Resubmitting the original patch, as it stands, from HADOOP-13786

> FileOutputFormat to support configurable FileOutputCommitter factory
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6823
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha2
>         Environment: Targeting S3 as the output of work
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which 
> can talk direct to the S3A Filesystem for more efficient operations, better 
> failure modes, and, most critically, as part of HADOOP-13345, atomic commit 
> of output. The normal committer relies on directory rename() being atomic for 
> this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat 
> (and implicitly, all subclasses which don't have their own custom committer), 
> to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or 
> new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to