[ https://issues.apache.org/jira/browse/FLINK-31410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhipeng Zhang updated FLINK-31410: ---------------------------------- Description: In Flink ML, we use ListStateWithCache [2] to enable caching data in memory and filesystem. However, it does not support incremental snapshot now — It writes all the data to checkpoint stream whenever calling snapshot [1], which could be inefficient. [1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116] [2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java] was: In Flink ML, we used ListStateWithCache [2] to enable caching data in memory and filesystem. However, it does not support incremental snapshot now — It writes all the data to checkpoint stream when calling snapshot [1], which could be inefficient. [1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116] [2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java] > ListStateWithCache Should support incremental snapshot > ------------------------------------------------------ > > Key: FLINK-31410 > URL: https://issues.apache.org/jira/browse/FLINK-31410 > Project: Flink > Issue Type: Improvement > Components: Library / Machine Learning > Affects Versions: ml-2.2.0 > Reporter: Zhipeng Zhang > Priority: Major > > In Flink ML, we use ListStateWithCache [2] to enable caching data in memory > and filesystem. However, it does not support incremental snapshot now — It > writes all the data to checkpoint stream whenever calling snapshot [1], which > could be inefficient. > > > [1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116] > > [2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java] > -- This message was sent by Atlassian Jira (v8.20.10#820010)