Xingcan Cui created FLINK-36914: ----------------------------------- Summary: Sources with watermark alignment should wait for watermark generation Key: FLINK-36914 URL: https://issues.apache.org/jira/browse/FLINK-36914 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Xingcan Cui
When watermark alignment is enabled across multiple sources, event ingestion should pause until each source generates at least one watermark. However, currently, sources that fail to produce any watermark (for any reason) won't participate in {{{}MaxAllowedWatermark{}}}'s computation. This happens whenever a job restarts and not all the sources can start to read data at exactly the same time. WatermarkAlignment is usually enabled when users need to control the state size of a job by avoiding caching too much data that won't be used right now. So when a source can't produce watermarks, it should hold off event ingestion from other sources instead of allowing them to read more events. Regarding implementation, I'm unsure if the refactored Flink source API can easily support "peek and generate watermarks". Some new mechanisms might need to be introduced to support it. -- This message was sent by Atlassian Jira (v8.20.10#820010)