[ 
https://issues.apache.org/jira/browse/KAFKA-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057100#comment-18057100
 ] 

Karsten Stöckmann edited comment on KAFKA-19992 at 2/10/26 10:38 AM:
---------------------------------------------------------------------

Thanks a lot [~mjsax] for taking the time to bring this up. This is based on 
our observations while 'backfilling' state stores from a huge pile of 
pre-existing data. Specifically, we are implementing a CDC stack utilizing 
Debezium and Kafka Streams. Starting the streaming pipeline from scratch means 
to operate on pre-filled topics with millions of records before regular 
streaming starts. This backfill phase is extremely difficult to operate because 
even widely scaled out streams applications frequently die on OutOfMemory 
errors from restore buffers which can't be effectively controlled resource-wise.

A means to limit memory consumption of the restore path would vastly improve 
operability in case pipeline restarts are necessary for whichever reason.

I'd be happy to provide additional context or information if needed.


was (Author: JIRAUSER305369):
Thanks a lot [~mjsax] for taking the time to bring this up. This is based on 
our observations while 'backfilling' state stores from a huge pile of 
pre-existing data. Specifically, we are implementing a CDC stack utilizing 
Debezium and Kafka Streams. Starting the streaming pipeline from scratch means 
to operate on pre-filled topics with millions of records before regular 
streaming starts. This backfill phase is extremely difficult to operate because 
even widely scaled out streams applications frequently die on OutOfMemory 
errors from restore buffers which can't be effectively controlled resource-wise.

A means to limit memory consumption of the restore path would hugely improve 
operability in case pipeline restarts are necessary for whichever reason.

I'd be happy to provide additional context or information if needed.

> Allow to configure max memory for restore path
> ----------------------------------------------
>
>                 Key: KAFKA-19992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19992
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>              Labels: needs-kip
>
> Kafka Streams allows users to configure many different memory limits, to 
> pro-actively manage memory usage to avoid OutOfMemory errors.
> However, on the restore code path, there is no such memory management. The 
> `StoreChangelogReader` fetches a batch of message, and buffers them inside 
> `ChangelogMetadata`, and there is no limit enforced on this buffer.
> We could either re-use the existing config for input topic buffer (`
> buffered.records.per.partition`) or introduce a new config (this would 
> require a KIP though). – Even if we re-use the existing config, not 100% sure 
> if we would need a KIP or not?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to