[ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-12692:
-------------------------------
    Affects Version/s: 2.1.0

> Support disk spilling in HeapKeyedStateBackend
> ----------------------------------------------
>
>                 Key: FLINK-12692
>                 URL: https://issues.apache.org/jira/browse/FLINK-12692
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / State Backends
>    Affects Versions: 2.1.0
>            Reporter: Yu Li
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to