[ 
https://issues.apache.org/jira/browse/FLINK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469630#comment-17469630
 ] 

刘方奇 commented on FLINK-25528:
-----------------------------

[~sjwiesman] THX for your reply, I knew your meaning. Actually, even both 
incremental snapshot and savepoint will snapshot the entire state, they have 
huge different performance.

Cause in implement detail, incremental snapshot don't need to read every single 
KV entry then write to the output stream but savepoint need. It means more 
performance overhead. Additionally, incremental snapshot will use 
multi-outputstream to upload sst file but savepoint will use the single output 
stream only to do snapshot work. 

We can see the code in the 
org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrategy & 
org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.

BTW, in the huge state case, the difference will be obviouser. In our tests, 
when we snapshot huge state which is more than 1 TB, it seems like the snapshot 
work will never go the end if we use the savepoint.

> state processor api do not support increment checkpoint
> -------------------------------------------------------
>
>                 Key: FLINK-25528
>                 URL: https://issues.apache.org/jira/browse/FLINK-25528
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / State Processor, Runtime / State Backends
>            Reporter: 刘方奇
>            Priority: Major
>
> As the title, in the state-processor-api, we use the savepoint opition to 
> snapshot state defaultly in org.apache.flink.state.api.output.SnapshotUtils.
> But in many cases, we maybe need to snapshot state incremently which have 
> better performance than savepoint.
> Shall we add the config to chose the checkpoint way just like flink stream 
> backend?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to