[ 
https://issues.apache.org/jira/browse/FLINK-22456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339708#comment-17339708
 ] 

Stephan Ewen commented on FLINK-22456:
--------------------------------------

We are trying to migrate away from the {{InputFormat}} API over time. The new 
Source API has gotten a lot of stability improvements in the recent release and 
will see broader support for connectors in the next months.
It would be helpful for the community to not add more functionality to the old 
APIs because we otherwise need to put more effort to maintain this (from 
experience, every change has follow-up issues).

Directions we can explore to work around this:

  - In the new Source API, you can use the point when all splits have been 
returned (no more splits are available) to signal the finalization. Migrating 
to the new Source API would solve this. If you are concerned about the 
stability, how complex is it for you to keep this custom code and then migrate 
to the new API in the next months?

  - For what part exactly do you need the finalization? Sources usually do not 
have side effects, so curious whether there is a different way to realize the 
mechanism.

  - It should be possible to write an adapter that wraps an InputFormat into a 
new Source. That adapter can then easily be extended for functions like 
{{FinalizeOnMaster}}, because it is not part of the core engine.

Especially the last option would be quite cool and really helpful for the 
community. What do you think?

> Support InitializeOnMaster and FinalizeOnMaster to be used in InputFormat
> -------------------------------------------------------------------------
>
>                 Key: FLINK-22456
>                 URL: https://issues.apache.org/jira/browse/FLINK-22456
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream, Runtime / Coordination
>            Reporter: Li
>            Priority: Minor
>              Labels: pull-request-available
>
>         In _InputOutputFormatVertex_, _initializeGlobal_ and _finalizeGlobal_ 
> are only called when the Format is _OutputFormat_, however _InputFormat_ is 
> not be called.
>         In FLINK-1722, its say _HadoopOutputFormats_ ues it to do something 
> before and after the task. And they only support _initializeGlobal_ and 
> _finalizeGlobal_ in _OutputFormat_.
>         I don't know why _InputFormat_ doesn't support, anyone can tell me 
> why?
>         But I think _InitializeOnMaster_ and _FinalizeOnMaster_ should also 
> be supported in _InputFormat_.
>         For example, an offline task in _JdbcInputFormat_, user can use 
> _initializeGlobal_ to query the total counts of this task, and then user can 
> create InputSplits by total counts. While task running, user can add progress 
> indicators metric by calculating the total number of records divided by the 
> current number of reads, and even the remaining time of the task can be 
> estimated. It is very helpful for users to view task progress and remaining 
> time through external systems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to