Filip Niksic created FLINK-14616:
------------------------------------
Summary: Clarify the ordering guarantees in the "The Broadcast
State Pattern"
Key: FLINK-14616
URL: https://issues.apache.org/jira/browse/FLINK-14616
Project: Flink
Issue Type: Improvement
Components: Documentation
Affects Versions: 1.9.1
Reporter: Filip Niksic
When talking about the order of events in [The Broadcast State
Pattern|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/broadcast_state.html#important-considerations],
the current documentation states that the downstream tasks must not assume the
broadcast events to be ordered. However, this seems to be imprecise. According
to the response I got from [~fhueske] to a
[question|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Ordered-events-in-broadcast-state-tp30879.html]
I sent to the Flink user mailing list:
{quote}The order of broadcasted inputs is not guaranteed when the operator that
broadcasts its output has a parallelism > 1 because the tasks that receive the
broadcasted input consume the records in "random" order from their input
channels.
{quote}
In particular, when the parallelism of the broadcasting operator is 1, the
order _is_ guaranteed.
[~fhueske] continues with his suggestions on how to ensure the correct ordering
of the broadcast events:
{quote}So there are two approaches:
1) make the operator that broadcasts its output run as an operator with
parallelism 1 (or add a MapOperator with parallelism 1 that just forwards its
input). This will cause all broadcasted records to go through the same network
channel and their order is guaranteed on each receiver.
2) use timestamps of broadcasted records for ordering and watermarks to reason
about completeness.
If the broadcasted data is (comparatively) small in volume (which is usually
given because otherwise broadcasting would be expensive), I'd go with the first
option.
The second approach is more difficult to implement.
{quote}
It would be great if the ordering guarantees could be clarified to avoid
confusion. This could be achieved by simply expanding the paragraph that talks
about the order of events in the "important considerations" section. More
ambitiously, the suggestions given by [~fhueske] could be turned into examples.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)