[jira] [Commented] (FLINK-11737) Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output

vinoyang (JIRA) Mon, 25 Feb 2019 18:20:21 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777496#comment-16777496
 ]


vinoyang commented on FLINK-11737:
----------------------------------

[~StephanEwen] updated. Constructing 
{{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}} in hadoop requires 
an instance of the {{TaskInputOutputContext}} interface, and the most common 
implementation of this interface is {{ReduceContextImpl}}. The Construction of 
{{ReduceContextImpl}} requires {{RawKeyValueIterator}} (requires an Iterator). 
The lowest-level {{OutputFormat}} in Flink is a single message output model 
(OutputFormat#writeRecord). Currently, to use {{MultipleOutputs}}, I can only 
use an {{MapPartitionFunction}} to get an {{Iterator}}. What do you think of 
this issue? cc [~fhueske]

> Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output
> ---------------------------------------------------------------------
>
>                 Key: FLINK-11737
>                 URL: https://issues.apache.org/jira/browse/FLINK-11737
>             Project: Flink
>          Issue Type: Improvement
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> This issue is to improve Flink's compatibility with Hadoop. Currently, for 
> the old version of the Hadoop API, there is 
> {{org.apache.hadoop.mapred.lib.MultipleOutputFormat}}, which can be used 
> directly. However, for the new version of the Hadoop API 
> {{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}}, the current Flink 
> cannot be supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-11737) Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output

Reply via email to