GitHub user senorcarbone opened a pull request:
https://github.com/apache/flink/pull/1668
[FLINK-3257] Add Exactly-Once Processing Guarantees for Iterative
DataStream Jobs
# **[WIP]**
This is a first version of the adapted snapshot algorithm to support
iterations. It is correct and works in practice...well, when memory capacity is
enough for its logging requirements but I am working on that, hopefully with a
little help from you. Before we go into the implementation details let me
describe briefly the new algorithm.
## Algorithm
Our existing checkpoint algorithm has a very fluid and straightforward
protocol. It just makes sure that all checkpoint barriers are aligned in each
operator so that all records before barriers (pre-shot) are processed before
taking a snapshot. Since waiting indefinitely for all records in-transit within
a cycle of an execution graph can violate termination (crucial liveness
property) we have to...save any unprocessed records for later during the
snapshot. In this take of the algorithm on Flink we assign that role to the
`Iteration Head`. The steps this version varies from the vanilla algorithm are
simply the following:
1. An `Iteration Head` receives a barrier from the system runtime (as
before) and:
- Goes into **Logging Mode**. That means that from that point on every
record it receives from its `Iteration Sink` is buffered in its own operator
state and **not** forwarded further until it goes back to normal mode.
- **Forwards** the barrier to its downstream nodes (this guarantees
liveness, otherwise we have a deadlock).
2. Eventually, the `Iteration Head` receives
## Example
blabla

blabla

## Current Implementation Details
## Open/Pending Issues
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/senorcarbone/flink ftloops
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1668.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1668
----
commit 38256e4c4bb00183794699027e8e4298787c66fa
Author: Paris Carbone <[email protected]>
Date: 2016-01-19T10:18:54Z
exactly-once processing test for stream iterations
commit dbf2625536289dc70724ef798fd02989e586d874
Author: Paris Carbone <[email protected]>
Date: 2016-02-18T15:49:19Z
[wip] adapt snapshot mechanism for iterative jobs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---