[jira] [Commented] (FLINK-31749) The Using Hadoop OutputFormats example is not avaliable for DataStream

Etienne Chauchot (Jira) Fri, 14 Apr 2023 01:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712265#comment-17712265
 ]


Etienne Chauchot commented on FLINK-31749:
------------------------------------------

[~martijnvisser] I'd say both:
* Datastream output is supported through DataStream#addSink(SinkFunction)
* There is no SinkFunction for Hadoop so then the OutputFormat is only for 
batch mode and the DataSet API
=> So I'd remove the output section of hadoop format for DataStream API. The 
input section code seem correct but I don't know if it would work in streaming 
mode as there will not be any continuously incomming data. Should we remove 
Hadoop as a whole from the DataStream doc ?

And by the way I checked for the Cassandra case that I know well and I can see 
we are missing docs about Dataset input/output formats 

> The Using Hadoop OutputFormats example is not avaliable for DataStream
> ----------------------------------------------------------------------
>
>                 Key: FLINK-31749
>                 URL: https://issues.apache.org/jira/browse/FLINK-31749
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.17.0, 1.15.4
>            Reporter: junzhong qin
>            Priority: Not a Priority
>
> The following example shows how to use Hadoop’s {{TextOutputFormat from the 
> doc: 
> [https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/formats/hadoop/#using-hadoop-outputformats]
>  . But the DataStream has no output().}}
> {code:java}
> // Obtain the result we want to emit
> DataStream<Tuple2<Text, IntWritable>> hadoopResult = [...]
> // Set up the Hadoop TextOutputFormat.
> HadoopOutputFormat<Text, IntWritable> hadoopOF =
>   // create the Flink wrapper.
>   new HadoopOutputFormat<Text, IntWritable>(
>     // set the Hadoop OutputFormat and specify the job.
>     new TextOutputFormat<Text, IntWritable>(), job
>   );
> hadoopOF.getConfiguration().set("mapreduce.output.textoutputformat.separator",
>  " ");
> TextOutputFormat.setOutputPath(job, new Path(outputPath));
> // Emit data using the Hadoop TextOutputFormat.
> hadoopResult.output(hadoopOF); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31749) The Using Hadoop OutputFormats example is not avaliable for DataStream

Reply via email to