[
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452259#comment-17452259
]
Mingmin Xu commented on FLINK-12692:
------------------------------------
Hello, any timeline to merge it in Flink main repository? I'm looking for a
similar solution, with some changes to spill data remotely, so the local state
is pretty small for fast backup/restore. Seems I can extend the feature based
on SpillableStateBackend.
> Support disk spilling in HeapKeyedStateBackend
> ----------------------------------------------
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / State Backends
> Reporter: Yu Li
> Priority: Major
> Labels: auto-unassigned, pull-request-available
> Fix For: 1.15.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink,
> since state lives as Java objects on the heap and the de/serialization only
> happens during state snapshot and restore, it outperforms
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its
> shortcomings, and the most painful one is the difficulty to estimate the
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the
> heap memory is not enough to hold all state data. There’re several
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data
> to disk before heap memory is exhausted. We will monitor the heap usage and
> choose the coldest data to spill, and reload them when heap memory is
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)