GitHub user StefanRRichter opened a pull request:
https://github.com/apache/flink/pull/3466
[FLINK-5958] Asynchronous snapshots for heap-based keyed state backends
This PR introduces optional asynchronous snapshots for heap-based keyed
state backends. The mechanism is based on providing a copy-on-write map
structure as basis for the backend.
In a first step, the PR introduces abstractions around `StateTables` in
`HeapKeyedStateBackend` so that we can support different state table
implementations. We keep the original implementation as `NestedMapsStateTable`.
In the second step, we introduce `CopyOnWriteStateTable`, an implementation
based on a single flat hash map, that supports MVCC. Copy-on-write is eagerly
performed when snapshots are taken in parallel to other backend operations, and
`TypeSerializer`s are used to realize the copying. One remaining serial part in
the whole snapshot is doing an array-copy of the hash map's base array(s),
which is very fast compared to all the remaining parts of the checkpointing.
Another feature of `CopyOnWriteStateTable` is incremental rehashing, to
reduce blocking behaviour on larger rehashes. This implementation is not a
silver bullet over the `NestedMapsStateTable`. In general, it can introduce
higher memory footprint, more GC activity, and lower base performance in for
some access pattern (but higher for others). The JavaDoc of the respective
classes goes into more detail about this.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StefanRRichter/flink
state-table-interface-consolidate
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3466.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3466
----
commit dd0863fc029a0d5ab3d52bb402663c8543f7f483
Author: Stefan Richter <[email protected]>
Date: 2017-02-20T17:12:10Z
Introduce abstraction for StateTable
commit a0cc386d29dd6a76926fb260cac43019add5bc2a
Author: Stefan Richter <[email protected]>
Date: 2017-03-03T09:51:15Z
Improved copy performance for ArrayListSerializer
commit f13182fe35fc431bbb7b06966914f246a1eab55c
Author: Stefan Richter <[email protected]>
Date: 2017-03-03T10:08:00Z
Asynchronous snapshots through CopyOnWriteStateTable
commit 664be6e8c258cdb711b37d3ce76b03095c2c042f
Author: Stefan Richter <[email protected]>
Date: 2017-03-03T09:25:43Z
Improve ManualWindowSpeedITCase by randomizing the access pattern
commit 5e8000e2272605db6acc612d39feab0ccc6c4acd
Author: Stefan Richter <[email protected]>
Date: 2017-03-03T09:50:52Z
Additional unit tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---