[ 
https://issues.apache.org/jira/browse/FLINK-22456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332959#comment-17332959
 ] 

Li commented on FLINK-22456:
----------------------------

        Maybe you mean users can query and generate the Splits by 
SplitEnumerator in new Data Source API. JdbcInputFormat is just an example, for 
different data source types depending on the users’s business, users do 
different things before and after the task.

        As the document says, the new Data Source API is currently in BETA 
status. Most of the existing source connectors are not yet (as of Flink 1.11) 
implemented using this new API, but using the previous API, based on 
SourceFunction, it is the same for users.

        In our Flink program, we have modified the source code of Flink. And 
now, we want to share it to Flink. This is only a dozen lines of code, but it 
gives users more possibilities.

        Looking forward to your reply. CC [~lzljs3620320]

> Support InitializeOnMaster and FinalizeOnMaster to be used in InputFormat
> -------------------------------------------------------------------------
>
>                 Key: FLINK-22456
>                 URL: https://issues.apache.org/jira/browse/FLINK-22456
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>            Reporter: Li
>            Priority: Minor
>              Labels: pull-request-available
>
>         In _InputOutputFormatVertex_, _initializeGlobal_ and _finalizeGlobal_ 
> are only called when the Format is _OutputFormat_, however _InputFormat_ is 
> not be called.
>         In FLINK-1722, its say _HadoopOutputFormats_ ues it to do something 
> before and after the task. And they only support _initializeGlobal_ and 
> _finalizeGlobal_ in _OutputFormat_.
>         I don't know why _InputFormat_ doesn't support, anyone can tell me 
> why?
>         But I think _InitializeOnMaster_ and _FinalizeOnMaster_ should also 
> be supported in _InputFormat_.
>         For example, an offline task in _JdbcInputFormat_, user can use 
> _initializeGlobal_ to query the total counts of this task, and then user can 
> create InputSplits by total counts. While task running, user can add progress 
> indicators metric by calculating the total number of records divided by the 
> current number of reads, and even the remaining time of the task can be 
> estimated. It is very helpful for users to view task progress and remaining 
> time through external systems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to