[ 
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106558#comment-17106558
 ] 

Matthias J. Sax commented on KAFKA-3184:
----------------------------------------

I am personally not sure how useful "local checkpointing" is at all? Note that 
for persistent stores, the ides is to allow holding state that is larger than 
main memory. It's not really related to fault-tolerance or similar (it only has 
the nice side effect that rolling restarts are quick – for this case, maybe 
dumping the in-memory store on disk during shutdown might be helpful; but this 
would not be regular checkpointing). Also, to avoid too long state recovery 
times, used can configure standbys.

For scale-out, the same issue arrises for in-memory and persistent stores and 
it's addressed via KIP-441. So we should not conflate orthogonal issue.

Having a remote checkpoint mechanism would be a nice to have feature, but it 
raises a lot of complex issues we need to address. (1) where to actually put 
the store? Kafka Streams is a library and should not have other dependencies by 
default; thus, remote checkpointing must be an opt-in feature only. (2) How can 
we do incremental checkpointing (if it's not incremental, it's too heavy weight 
and we don't need to build it at all), (3) how do we "compact" the check point 
increments in the remote location? This is the hardest issue we need to solve.

There was also the idea to actually re-use standbys for quick recovery: instead 
of checkpointing to remote storage, one just configures standbys. On recovery, 
instead of reading the changelog, we copy RocksDB sst files directly from the 
standby to the active (or to be more precise, the standby would be come the 
active anyway...) This approach avoids the dependency to an external system and 
also solves the "how to compact" issue, as RocksDB does it for us. Of course, 
it's more expensive (in dollars) to keep the state in the app compared to 
pushing it to cheap external storage.

Thus some food for thoughts.

> Add Checkpoint for In-memory State Store
> ----------------------------------------
>
>                 Key: KAFKA-3184
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3184
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Nikolay Izhikov
>            Priority: Major
>              Labels: user-experience
>
> Currently Kafka Streams does not make a checkpoint of the persistent state 
> store upon committing, which would be expensive since it is "stopping the 
> world" and write on disks: for example, RocksDB would require you to copy the 
> file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous 
> manner hence it can be done quickly. And the benefit of having intermediate 
> checkpoint is to avoid restoring from scratch if standby tasks are not 
> present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to