[ 
https://issues.apache.org/jira/browse/FLINK-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711474#comment-14711474
 ] 

ASF GitHub Bot commented on FLINK-2394:
---------------------------------------

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/1056

    [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

    Right now, Flink's wrappers for Hadoop OutputFormats always use a 
`FileOutputCommitter`.
    
    - In the `mapreduce` API, Hadoop OutputFormats have a method 
`getOutputCommitter()` which can be overwritten and returns the 
`FileOutputFormat` by default.
    - In the `mapred`API, the `OutputCommitter` should be obtained from the 
`JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned.
    
    This PR uses the respective methods to obtain the correct 
`OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, 
the original semantics are preserved if no custom committer is implemented or 
set by the user.
    I also added convenience methods to the constructors of the `mapred` 
wrappers to set the `OutputCommitter` in the `JobConf`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink hadoopOutCommitter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1056
    
----
commit a632203a948f2e7973339a0eab88750f7ce70cc5
Author: Fabian Hueske <[email protected]>
Date:   2015-07-30T19:47:01Z

    [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

----


> HadoopOutFormat OutputCommitter is default to FileOutputCommiter
> ----------------------------------------------------------------
>
>                 Key: FLINK-2394
>                 URL: https://issues.apache.org/jira/browse/FLINK-2394
>             Project: Flink
>          Issue Type: Bug
>          Components: Hadoop Compatibility
>    Affects Versions: 0.9.0
>            Reporter: Stefano Bortoli
>            Assignee: Fabian Hueske
>             Fix For: 0.10, 0.9.1
>
>
> MongoOutputFormat does not write back in collection because the 
> HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and 
> is set as default to FileOutputCommitter. Therefore, on close and 
> globalFinalize execution the commit does not happen and mongo collection 
> stays untouched. 
> A simple solution would be to:
> 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat 
> that gets the OutputCommitter as a parameter
> 2 - change the outputCommitter field of HadoopOutputFormatBase to be a 
> generic OutputCommitter
> 3 - remove the default assignment in the open() and finalizeGlobal to the 
> outputCommitter to FileOutputCommitter(), or keep it as a default in case of 
> no specific assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to