[ 
https://issues.apache.org/jira/browse/FLINK-27827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543479#comment-17543479
 ] 

Andreas Hailu commented on FLINK-27827:
---------------------------------------

If this is something that the community finds useful, I'm happy to be the one 
to pick this up.

> StreamExecutionEnvironment method supporting explicit Boundedness
> -----------------------------------------------------------------
>
>                 Key: FLINK-27827
>                 URL: https://issues.apache.org/jira/browse/FLINK-27827
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Andreas Hailu
>            Priority: Minor
>
> When creating a {{{}DataStreamSource{}}}, an explicitly bounded input is only 
> returned if the {{InputFormat}} provided implements {{{}FileInputFormat{}}}. 
> This is results in runtime exceptions when trying to run applications in 
> Batch execution mode while using non {{{}FileInputFormat{}}}s e.g. Apache 
> Iceberg [1], Flink's Hadoop MapReduce compatibility API's [2] inputs, etc...
> I understand there is a {{DataSource}} API [3] that supports the 
> specification of the boundedness of an input, but that would require all 
> connectors to revise their APIs to leverage it which would take some time.
> My organization is in the middle of migrating from the {{DataSet}} API to the 
> {{DataStream }}API, and we've found this to be a challenge as nearly all of 
> our inputs have been determines to be unbounded as we use {{InputFormats}} 
> that are not {{{}FileInputFormat{}}}s. Our work-around was to provide a local 
> patch in {{StreamExecutionEnvironment}} with a method supporting explicitly 
> bounded inputs.
> As this helped us implement a Batch {{DataStream}} solution, perhaps this is 
> something that may be helpful for others?
>  
> [1] [https://iceberg.apache.org/docs/latest/flink/#reading-with-datastream]
> [2] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/dataset/hadoop_map_reduce/]
>  
> [3] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/sources/#the-data-source-api]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to