[ 
https://issues.apache.org/jira/browse/FLINK-27827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544337#comment-17544337
 ] 

Andreas Hailu commented on FLINK-27827:
---------------------------------------

Hi [~gaoyunhaii] & [~martijnvisser], very well. Thanks for your input!

> StreamExecutionEnvironment method supporting explicit Boundedness
> -----------------------------------------------------------------
>
>                 Key: FLINK-27827
>                 URL: https://issues.apache.org/jira/browse/FLINK-27827
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Andreas Hailu
>            Priority: Minor
>
> When creating a {{{}DataStreamSource{}}}, an explicitly bounded input is only 
> returned if the {{InputFormat}} provided implements {{{}FileInputFormat{}}}. 
> This is results in runtime exceptions when trying to run applications in 
> Batch execution mode while using non {{{}FileInputFormat{}}}s e.g. Apache 
> Iceberg [1], Flink's Hadoop MapReduce compatibility API's [2] inputs, etc...
> I understand there is a {{DataSource}} API [3] that supports the 
> specification of the boundedness of an input, but that would require all 
> connectors to revise their APIs to leverage it which would take some time.
> My organization is in the middle of migrating from the {{DataSet}} API to the 
> {{{}DataStream API{}}}, and we've found this to be a challenge as nearly all 
> of our inputs have been determined to be unbounded as we use {{InputFormats}} 
> that are not {{{}FileInputFormat{}}}s.
> Our work-around was to provide a local patch in 
> {{StreamExecutionEnvironment}} with a method supporting explicitly bounded 
> inputs.
> As this helped us implement a Batch {{DataStream}} solution, perhaps this is 
> something that may be helpful for others?
>  
> [1] [https://iceberg.apache.org/docs/latest/flink/#reading-with-datastream]
> [2] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/dataset/hadoop_map_reduce/]
>  
> [3] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/sources/#the-data-source-api]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to