[ 
https://issues.apache.org/jira/browse/FLINK-31410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhipeng Zhang updated FLINK-31410:
----------------------------------
    Description: 
In Flink ML, we use ListStateWithCache [2] to enable caching data in memory and 
filesystem. However, it does not support incremental snapshot now — It writes 
all the data to checkpoint stream whenever calling snapshot [1], which could be 
inefficient.

 

 
[1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]

 
[2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
 

  was:
In Flink ML, we used ListStateWithCache [2] to enable caching data in memory 
and filesystem. However, it does not support incremental snapshot now — It 
writes all the data to checkpoint stream when calling snapshot [1], which could 
be inefficient.

 

 
[1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]

 
[2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
 


> ListStateWithCache Should support incremental snapshot
> ------------------------------------------------------
>
>                 Key: FLINK-31410
>                 URL: https://issues.apache.org/jira/browse/FLINK-31410
>             Project: Flink
>          Issue Type: Improvement
>          Components: Library / Machine Learning
>    Affects Versions: ml-2.2.0
>            Reporter: Zhipeng Zhang
>            Priority: Major
>
> In Flink ML, we use ListStateWithCache [2] to enable caching data in memory 
> and filesystem. However, it does not support incremental snapshot now — It 
> writes all the data to checkpoint stream whenever calling snapshot [1], which 
> could be inefficient.
>  
>  
> [1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]
>  
> [2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to