GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/938
[FLINK-2406] [FLINK-2402] Abstract the BarrierBuffers and add a "at least
once" version (BarrierTracker)
Currently, the checkpointing mechanism provides "exactly once" guarantees.
Part of that is the step that temporarily "aligns" the data streams, performed
by the `BarrierBuffer`. This step increases the tuple latency temporarily.
By offering a version that does not provide "exactly-once", but only
"at-least-once", we can avoid the latency increase. This is relevant for
super-low-latency applications (< 10 ms) that tolerate duplicates.
This pull request does:
- Add an interface for the functionality of the barrier buffer, to allow
adding different implementations
- Add the `BarrierTracker` which only tracks checkpoint barriers, without
aligning the streams.
- Add broader tests for the BarrierBuffer, including trailing data and
barrier races.
- Checkpoint barriers are handled by the buffer directly, rather than
being returned and re-injected.
- Simplify logic in the BarrierBuffer and fix certain corner cases.
- Give access to spill directories properly via I/O manager, rather than
via GlobalConfiguration singleton.
- Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest"
- A lot of code style/robustness fixes (proplery define constants,
visibility, exception signatures)
@gyfora and @uce : The reworked variant of the BarrierBuffer eventually
returns `null` when all buffers and events have been consumed. The previous
impementation repeatedly returned the `EndOfPartitionEvent`. That seemed
inconsistent with the implementation of the "vanilla" InputGate, which
eventually returns `null` when all partitions have finished.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink latency
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/938.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #938
----
commit 9d9e34e85c812eedb7b0f6365952c853fc531627
Author: Stephan Ewen <[email protected]>
Date: 2015-07-12T17:33:38Z
[tests] Add a manual test to evaluate impact of checkpointing on latency
commit 4cafad08a1b64bcaf0c95fe0b211eb740f6774b2
Author: Stephan Ewen <[email protected]>
Date: 2015-07-26T16:58:37Z
[FLINK-2406] [streaming] Abstract and improve stream alignment via the
BarrierBuffer
- Add an interface for the functionaliy of the barrier buffer (for later
addition of other implementatiions)
- Add broader tests for the BarrierBuffer, inluding trailing data and
barrier races.
- Checkpoint barriers are handled by the buffer directly, rather than
being returned and re-injected.
- Simplify logic in the BarrierBuffer and fix certain corner cases.
- Give access to spill directories properly via I/O manager, rather than
via GlobalConfiguration singleton.
- Rename the "BarrierBufferIOTest" to "BarrierBufferMassiveRandomTest"
- A lot of code style/robustness fixes (proplery define constants,
visibility, exception signatures)
commit ba9e5a3cc1aacb5078fdc50fee12f60ccefd2563
Author: Stephan Ewen <[email protected]>
Date: 2015-07-26T17:05:30Z
[FLINK-2402] [streaming] Add a stream checkpoint barrier tracker.
The BarrierTracker is non-blocking and only counts barriers.
That way, it does not increase latency of records in the stream, but can
only be
used to obtain "at least once" processing guarantees.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---