Xingcan Cui created FLINK-36914:
-----------------------------------

             Summary: Sources with watermark alignment should wait for 
watermark generation
                 Key: FLINK-36914
                 URL: https://issues.apache.org/jira/browse/FLINK-36914
             Project: Flink
          Issue Type: Improvement
          Components: API / DataStream
            Reporter: Xingcan Cui


When watermark alignment is enabled across multiple sources, event ingestion 
should pause until each source generates at least one watermark. However, 
currently, sources that fail to produce any watermark (for any reason) won't 
participate in {{{}MaxAllowedWatermark{}}}'s computation. This happens whenever 
a job restarts and not all the sources can start to read data at exactly the 
same time.

WatermarkAlignment is usually enabled when users need to control the state size 
of a job by avoiding caching too much data that won't be used right now. So 
when a source can't produce watermarks, it should hold off event ingestion from 
other sources instead of allowing them to read more events.

Regarding implementation, I'm unsure if the refactored Flink source API can 
easily support "peek and generate watermarks". Some new mechanisms might need 
to be introduced to support it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to