[
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331348#comment-16331348
]
Guozhang Wang commented on KAFKA-3184:
--------------------------------------
[[email protected]] sorry for the late reply!
Your understanding is basically right, and here are my thoughts about the
flushing:
1. It should not be expensive and stopping-the-world, since flushing calls may
be called on each commit. I was thinking that it could either by async (but we
need to make the checkpointed offsets value to be consistent with the
checkpoint image on disk itself); or we only do the checkpointing every N.
flush calls.
2. As for {{persistent()}}, currently it is only used in
{{ProcessorStateManager#checkpoint}} and
{{StoreChangelogReader#restoredOffsets}}; with in-memory state stores being
checkpointed periodically, I think we can just deprecate this flag and let
these two callers always checkpoint / save restored offsets.
> Add Checkpoint for In-memory State Store
> ----------------------------------------
>
> Key: KAFKA-3184
> URL: https://issues.apache.org/jira/browse/KAFKA-3184
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Priority: Major
> Labels: user-experience
>
> Currently Kafka Streams does not make a checkpoint of the persistent state
> store upon committing, which would be expensive since it is "stopping the
> world" and write on disks: for example, RocksDB would require you to copy the
> file directory to make a copy naively.
> However, for in-memory stores checkpointing maybe doable in an asynchronous
> manner hence it can be done quickly. And the benefit of having intermediate
> checkpoint is to avoid restoring from scratch if standby tasks are not
> present.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)