GitHub user StefanRRichter opened a pull request:

    https://github.com/apache/flink/pull/3466

    [FLINK-5958] Asynchronous snapshots for heap-based keyed state backends

    This PR introduces optional asynchronous snapshots for heap-based keyed 
state backends. The mechanism is based on providing a copy-on-write map 
structure as basis for the backend.
    
    In a first step, the PR introduces abstractions around `StateTables` in 
`HeapKeyedStateBackend` so that we can support different state table 
implementations. We keep the original implementation as `NestedMapsStateTable`.
    
    In the second step, we introduce `CopyOnWriteStateTable`, an implementation 
based on a single flat hash map, that supports MVCC. Copy-on-write is eagerly 
performed when snapshots are taken in parallel to other backend operations, and 
`TypeSerializer`s are used to realize the copying. One remaining serial part in 
the whole snapshot is doing an array-copy of the hash map's base array(s), 
which is very fast compared to all the remaining parts of the checkpointing. 
    Another feature of `CopyOnWriteStateTable` is incremental rehashing, to 
reduce blocking behaviour on larger rehashes. This implementation is not a 
silver bullet over the `NestedMapsStateTable`. In general, it can introduce 
higher memory footprint, more GC activity, and lower base performance in for 
some access pattern (but higher for others). The JavaDoc of the respective 
classes goes into more detail about this.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StefanRRichter/flink 
state-table-interface-consolidate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3466
    
----
commit dd0863fc029a0d5ab3d52bb402663c8543f7f483
Author: Stefan Richter <[email protected]>
Date:   2017-02-20T17:12:10Z

    Introduce abstraction for StateTable

commit a0cc386d29dd6a76926fb260cac43019add5bc2a
Author: Stefan Richter <[email protected]>
Date:   2017-03-03T09:51:15Z

    Improved copy performance for ArrayListSerializer

commit f13182fe35fc431bbb7b06966914f246a1eab55c
Author: Stefan Richter <[email protected]>
Date:   2017-03-03T10:08:00Z

    Asynchronous snapshots through CopyOnWriteStateTable

commit 664be6e8c258cdb711b37d3ce76b03095c2c042f
Author: Stefan Richter <[email protected]>
Date:   2017-03-03T09:25:43Z

    Improve ManualWindowSpeedITCase by randomizing the access pattern

commit 5e8000e2272605db6acc612d39feab0ccc6c4acd
Author: Stefan Richter <[email protected]>
Date:   2017-03-03T09:50:52Z

    Additional unit tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to