[ 
https://issues.apache.org/jira/browse/FLINK-36914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906722#comment-17906722
 ] 

Hongshun Wang edited comment on FLINK-36914 at 12/18/24 11:47 AM:
------------------------------------------------------------------

> sources that fail to produce any watermark (for any reason) won't participate 
> in {{{}MaxAllowedWatermark{}}}'s computation. 

Hi, [~xccui] , I have also met this. And I solve it by setting 
table.exec.source.idle-timeout, which will aslo emit watermark if the source is 
idle for timeout. 

 

Currently, for multiple task for same source, it will pause the read of the 
higher watermark split by invoking SourceReader#pauseOrResumeSplits.It seems 
that you want watermark align to multiple source  rather just one source?  you 
can set 
same watermarkGroup for these sources.


was (Author: JIRAUSER298968):
> sources that fail to produce any watermark (for any reason) won't participate 
> in {{{}MaxAllowedWatermark{}}}'s computation. 

Hi, [~xccui] , I have also met this. And I solve it by setting 
table.exec.source.idle-timeout, which will aslo emit watermark if the source is 
idle for timeout. 

 

Currently, for multiple task for same source, it will pause the read of the 
higher watermark split by invoking SourceReader#pauseOrResumeSplits.It seems 
that you want watermark align to multiple source  rather just one source?

> Sources with watermark alignment should wait for watermark generation
> ---------------------------------------------------------------------
>
>                 Key: FLINK-36914
>                 URL: https://issues.apache.org/jira/browse/FLINK-36914
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Xingcan Cui
>            Priority: Major
>
> When watermark alignment is enabled across multiple sources, event ingestion 
> should pause until each source generates at least one watermark. However, 
> currently, sources that fail to produce any watermark (for any reason) won't 
> participate in {{{}MaxAllowedWatermark{}}}'s computation. This happens 
> whenever a job restarts and not all the sources can start to read data at 
> exactly the same time.
> WatermarkAlignment is usually enabled when users need to control the state 
> size of a job by avoiding caching too much data that won't be used right now. 
> So when a source can't produce watermarks, it should hold off event ingestion 
> from other sources instead of allowing them to read more events.
> Regarding implementation, I'm unsure if the refactored Flink source API can 
> easily support "peek and generate watermarks". Some new mechanisms might need 
> to be introduced to support it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to