[ 
https://issues.apache.org/jira/browse/KAFKA-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6643.
------------------------------------
    Resolution: Won't Fix

> Warm up new replicas from scratch when changelog topic has LIMITED retention 
> time
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-6643
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6643
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Navinder Brar
>            Priority: Major
>
> In the current scenario, Kafka Streams has changelog Kafka topics(internal 
> topics having all the data for the store) which are used to build the state 
> of replicas. So, if we keep the number of standby replicas as 1, we still 
> have more availability for persistent state stores as changelog Kafka topics 
> are also replicated depending upon broker replication policy but that also 
> means we are using at least 4 times the space(1 master store, 1 replica 
> store, 1 changelog, 1 changelog replica). 
> Now if we have an year's data in persistent stores(rocksdb), we don't want 
> the changelog topics to have an year's data as it will put an unnecessary 
> burden on brokers(in terms of space). If we have to scale our kafka streams 
> application(having 200-300 TB's of data) we have to scale the kafka brokers 
> as well. We want to reduce this dependency and find out ways to just use 
> changelog topic as a queue, having just 2 or 3 days of data and warm up the 
> replicas from scratch in some other way.
> I have few proposals in that respect.
> 1. Use a new kafka topic related to each partition which we need to warm up 
> on the fly(when node containing that partition crashes. Produce into this 
> topic from another replica/active and built new replica through this topic.
> 2. Use peer to peer file transfer(such as SFTP) as rocksdb can create 
> backups, which can be transferred from source node to destination node when a 
> new replica has to be built from scratch.
> 3. Use HDFS in intermediate instead of kafka topic where we can keep 
> scheduled backups for each partition and use those to build new replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to