[ 
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331348#comment-16331348
 ] 

Guozhang Wang commented on KAFKA-3184:
--------------------------------------

[[email protected]] sorry for the late reply!

Your understanding is basically right, and here are my thoughts about the 
flushing:

1. It should not be expensive and stopping-the-world, since flushing calls may 
be called on each commit. I was thinking that it could either by async (but we 
need to make the checkpointed offsets value to be consistent with the 
checkpoint image on disk itself); or we only do the checkpointing every N. 
flush calls.

2. As for {{persistent()}}, currently it is only used in 
{{ProcessorStateManager#checkpoint}} and 
{{StoreChangelogReader#restoredOffsets}}; with in-memory state stores being 
checkpointed periodically, I think we can just deprecate this flag and let 
these two callers always checkpoint / save restored offsets.

> Add Checkpoint for In-memory State Store
> ----------------------------------------
>
>                 Key: KAFKA-3184
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3184
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: user-experience
>
> Currently Kafka Streams does not make a checkpoint of the persistent state 
> store upon committing, which would be expensive since it is "stopping the 
> world" and write on disks: for example, RocksDB would require you to copy the 
> file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous 
> manner hence it can be done quickly. And the benefit of having intermediate 
> checkpoint is to avoid restoring from scratch if standby tasks are not 
> present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to