GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/951
[FLINK-2407] [streaming] Add an API switch to choose between "exactly once"
and "at least once".
Adds a switch to choose between **exactly once** and **at least once**
checkpointing mode.
Exactly Once
==========
Sets the checkpointing mode to "exactly once". This mode means that the
system will checkpoint the operator and user function state in such a way that,
upon recovery, every record will be reflected exactly once in the operator
state.
For example, if a user function counts the number of elements in a stream,
this number will consistently be equal to the number of actual elements in the
stream, regardless of failures and recovery.
Note that this does not mean that each record flows through the streaming
data flow only once. It means that upon recovery, the state of
operators/functions is restored such that the resumed data streams pick up
exactly at after the last modification to the state.
Note that this mode does not guarantee exactly-once behavior in the
interaction with external systems (only state in Flink's operators and user
functions). The reason for that is that a certain level of "collaboration" is
required between two systems to achieve exactly-once guarantees. However, for
certain systems, connectors can be written that facilitate this collaboration.
This mode sustains high throughput. Depending on the data flow graph and
operations, this mode may increase the record latency, because operators need
to align their input streams, in order to create a consistent snapshot point.
The latency increase for simple dataflows (no repartitioning) is negligible.
For simple dataflows with repartitioning, the average latency remains small,
but the slowest records typically have an increased latency.
At Least Once
===========
Sets the checkpointing mode to "at least once". This mode means that the
system will checkpoint the operator and user function state in a simpler way.
Upon failure and recovery, some records may be reflected multiple times in the
operator state.
For example, if a user function counts the number of elements in a stream,
this number will equal to, or larger, than the actual number of elements in the
stream, in the presence of failure and recovery.
This mode has minimal impact on latency and may be preferable in very-low
latency scenarios, where a sustained very-low latency (such as few
milliseconds) is needed, and where occasional duplicate messages (on recovery)
do not matter.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink
at_least_once_switch
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/951.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #951
----
commit b089efa6ef688d61f41463d644768845393a913b
Author: Stephan Ewen <[email protected]>
Date: 2015-07-29T12:12:42Z
[FLINK-2407] [streaming] Add an API switch to choose between "exactly once"
and "at least once".
commit 71bd9c01aa24c9420766c9c79ff80618341b8e69
Author: Stephan Ewen <[email protected]>
Date: 2015-07-29T12:49:23Z
[hotfix] Code cleanups in the StreamConfig
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---