[
https://issues.apache.org/jira/browse/KAFKA-18168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907634#comment-17907634
]
Keshan Pathirana commented on KAFKA-18168:
------------------------------------------
Hi [~Sg12] [~mjsax] ,
I hope you’re both doing well. This is actually my first issue contribution to
the Kafka project, so please bear with me if I end up asking too many questions.
That said, wouldn’t the issue mentioned above be resolved if we introduced a
config option to override the default behavior of waiting for 10,000 events
before creating a commit file?
> GlobalKTable does not checkpoint restored offsets until next 10K events
> -----------------------------------------------------------------------
>
> Key: KAFKA-18168
> URL: https://issues.apache.org/jira/browse/KAFKA-18168
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 3.4.1, 3.8.1
> Reporter: Sergey Zyrianov
> Assignee: Keshan Pathirana
> Priority: Minor
>
> As in https://issues.apache.org/jira/browse/KAFKA-5241, there is a state of
> considerable size kept on a topic that backs up GlobalKTalbe. Restoring
> GlobalKTable takes minutes before it is operational. After successful restore
> the checkpoint file is not created until further 10K events happen on the
> topic.
> The following scenario illustrates the issue:
> # {*}Scaling Out{*}: When a new instance (e.g., pod X) is added to an
> already running set of instances (pods 0...X-1), the new instance will
> restore the state successfully. However, it will not create a checkpoint file
> until 10K events are processed on the {{GlobalKTable}} topic.
> # {*}Lack of Traffic{*}: If there is no new traffic on the {{GlobalKTable}}
> topic, there is no mechanism to force the creation of the checkpoint file.
> The state remains uncheckpointed. Ref
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateManagerUtil.java#L78C35-L78C72]
> # {*}Instance Restart{*}: If the new instance (pod X) is restarted (due to
> update for ex) before 10K events have been processed, it will have to restore
> the entire state from the topic again, leading to the same time-consuming
> restoration process. This issue persists across restarts.
> IMO, checkpointing during the restore process and upon completion/close is
> missing in the current implementation
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)