Tzu-Li (Gordon) Tai created FLINK-5017:
------------------------------------------
Summary: Introduce WatermarkStatus stream element to allow for
temporarily idle streaming sources
Key: FLINK-5017
URL: https://issues.apache.org/jira/browse/FLINK-5017
Project: Flink
Issue Type: New Feature
Components: Streaming
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
Priority: Blocker
Fix For: 1.2.0
A {{WatermarkStatus}} element informs receiving operators whether or not they
should continue to expect watermarks from the sending operator. There are 2
kinds of status, namely {{IDLE}} and {{ACTIVE}}. Watermark status elements are
generated at the sources, and may be propagated through the operators of the
topology using {{Output#emitWatermarkStatus(WatermarkStatus)}}.
Sources and downstream operators should emit either of the status elements once
it changes between "watermark-idle" and "watermark-active" states.
A source is considered "watermark-idle" if it will not emit records for an
indefinite amount of time. This is the case, for example, for Flink's Kafka
Consumer, where sources might initially have no assigned partitions to read
from, or no records can be read from the assigned partitions. Once the source
detects that it will resume emitting data, it is considered "watermark-active".
Downstream operators with multiple inputs (ex. head operators of a
{{OneInputStreamTask}} or {{TwoInputStreamTask}}) should not wait for
watermarks from an upstream operator that is "watermark-idle" when deciding
whether or not to advance the operator's current watermark. When a downstream
operator determines that all upstream operators are "watermark-idle" (i.e. when
all input channels have received the watermark idle status element), then the
operator is considered to also be "watermark-idle", as it will temporarily be
unable to advance its own watermark. This is always the case for operators that
only read from a single upstream operator. Once an operator is considered
"watermark-idle", it should itself forward its idle status to inform downstream
operators. The operator is considered to be back to "watermark-active" as soon
as at least one of its upstream operators resume to be "watermark-active" (i.e.
when at least one input channel receives the watermark active status element),
and should also forward its active status to inform downstream operators.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)