[
https://issues.apache.org/jira/browse/FLINK-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711474#comment-14711474
]
ASF GitHub Bot commented on FLINK-2394:
---------------------------------------
GitHub user fhueske opened a pull request:
https://github.com/apache/flink/pull/1056
[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.
Right now, Flink's wrappers for Hadoop OutputFormats always use a
`FileOutputCommitter`.
- In the `mapreduce` API, Hadoop OutputFormats have a method
`getOutputCommitter()` which can be overwritten and returns the
`FileOutputFormat` by default.
- In the `mapred`API, the `OutputCommitter` should be obtained from the
`JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned.
This PR uses the respective methods to obtain the correct
`OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases,
the original semantics are preserved if no custom committer is implemented or
set by the user.
I also added convenience methods to the constructors of the `mapred`
wrappers to set the `OutputCommitter` in the `JobConf`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fhueske/flink hadoopOutCommitter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1056.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1056
----
commit a632203a948f2e7973339a0eab88750f7ce70cc5
Author: Fabian Hueske <[email protected]>
Date: 2015-07-30T19:47:01Z
[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.
----
> HadoopOutFormat OutputCommitter is default to FileOutputCommiter
> ----------------------------------------------------------------
>
> Key: FLINK-2394
> URL: https://issues.apache.org/jira/browse/FLINK-2394
> Project: Flink
> Issue Type: Bug
> Components: Hadoop Compatibility
> Affects Versions: 0.9.0
> Reporter: Stefano Bortoli
> Assignee: Fabian Hueske
> Fix For: 0.10, 0.9.1
>
>
> MongoOutputFormat does not write back in collection because the
> HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and
> is set as default to FileOutputCommitter. Therefore, on close and
> globalFinalize execution the commit does not happen and mongo collection
> stays untouched.
> A simple solution would be to:
> 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat
> that gets the OutputCommitter as a parameter
> 2 - change the outputCommitter field of HadoopOutputFormatBase to be a
> generic OutputCommitter
> 3 - remove the default assignment in the open() and finalizeGlobal to the
> outputCommitter to FileOutputCommitter(), or keep it as a default in case of
> no specific assignment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)