Hi Yu, +1 to move forward with efforts to add this feature.
As mentioned in the document as well as some offline discussions, from my side the only comments I have are related to how we snapshot the off-heap key groups. I think a recent discussion I posted about savepoint format unification for keyed state as well as reworking abstractions for snapshot strategies [1] will be relevant here. Cheers, Gordon [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-41-Unify-Keyed-State-Snapshot-Binary-Format-for-Savepoints-td29197.html On Wed, May 29, 2019 at 5:08 PM Yuzhao Chen <yuzhao....@gmail.com> wrote: > +1, thanks for you nice work, Yu Li ! > > Best, > Danny Chan > 在 2019年5月24日 +0800 PM8:51,Yu Li <car...@gmail.com>,写道: > > Hi All, > > > > As mentioned in our speak[1] given in FlinkForwardChina2018, we have > > improved HeapKeyedStateBackend to support disk spilling and put it in > > production here in Alibaba for last year's Singles' Day. Now we're ready > to > > upstream our work and the design doc is up for review[2]. Please let us > > know your point of the feature and any comment is welcomed/appreciated. > > > > We plan to keep the discussion open for at least 72 hours, and will > create > > umbrella jira and subtasks if no objections. Thanks. > > > > Below is a brief description about the motivation of the work, FYI: > > > > > > *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, > since > > state lives as Java objects on the heap in HeapKeyedStateBackend and the > > de/serialization only happens during state snapshot and restore, it > > outperforms RocksDBKeyeStateBackend when all data could reside in > > memory.**However, > > along with the advantage, HeapKeyedStateBackend also has its > shortcomings, > > and the most painful one is the difficulty to estimate the maximum heap > > size (Xmx) to set, and we will suffer from GC impact once the heap memory > > is not enough to hold all state data. There’re several (inevitable) > causes > > for such scenario, including (but not limited to):* > > > > > > > > ** Memory overhead of Java object representation (tens of times of the > > serialized data size).* Data flood caused by burst traffic.* Data > > accumulation caused by source malfunction.**To resolve this problem, we > > proposed a solution to support spilling state data to disk before heap > > memory is exhausted. We will monitor the heap usage and choose the > coldest > > data to spill, and reload them when heap memory is regained after data > > removing or TTL expiration, automatically.* > > > > [1] > https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf > > [2] > > > https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing > > > > Best Regards, > > Yu >