[jira] [Commented] (FLINK-27827) StreamExecutionEnvironment method supporting explicit Boundedness

Martijn Visser (Jira) Mon, 30 May 2022 07:41:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543977#comment-17543977
 ]


Martijn Visser commented on FLINK-27827:
----------------------------------------

With regards to:

> I understand there is a DataSource API [3] that supports the specification of 
> the boundedness of an input, but that would require all connectors to revise 
> their APIs to leverage it which would take some time.

This has been a deliberate choice; the DataSet has been deprecated for quite 
some time and FLIP-27 has been created for both bounded and unbounded sources. 
Most Apache maintained Flink connectors have been migrated already or in the 
process of being migrated. 

> StreamExecutionEnvironment method supporting explicit Boundedness
> -----------------------------------------------------------------
>
>                 Key: FLINK-27827
>                 URL: https://issues.apache.org/jira/browse/FLINK-27827
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Andreas Hailu
>            Priority: Minor
>
> When creating a {{{}DataStreamSource{}}}, an explicitly bounded input is only 
> returned if the {{InputFormat}} provided implements {{{}FileInputFormat{}}}. 
> This is results in runtime exceptions when trying to run applications in 
> Batch execution mode while using non {{{}FileInputFormat{}}}s e.g. Apache 
> Iceberg [1], Flink's Hadoop MapReduce compatibility API's [2] inputs, etc...
> I understand there is a {{DataSource}} API [3] that supports the 
> specification of the boundedness of an input, but that would require all 
> connectors to revise their APIs to leverage it which would take some time.
> My organization is in the middle of migrating from the {{DataSet}} API to the 
> {{{}DataStream API{}}}, and we've found this to be a challenge as nearly all 
> of our inputs have been determined to be unbounded as we use {{InputFormats}} 
> that are not {{{}FileInputFormat{}}}s.
> Our work-around was to provide a local patch in 
> {{StreamExecutionEnvironment}} with a method supporting explicitly bounded 
> inputs.
> As this helped us implement a Batch {{DataStream}} solution, perhaps this is 
> something that may be helpful for others?
>  
> [1] [https://iceberg.apache.org/docs/latest/flink/#reading-with-datastream]
> [2] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/dataset/hadoop_map_reduce/]
>  
> [3] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/sources/#the-data-source-api]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27827) StreamExecutionEnvironment method supporting explicit Boundedness

Reply via email to