Hey De Gao, thanks for the KIP! As you’re probably aware, a Partition is a logical construct in Kafka. A broker hosts a partition which is composed of physical log segments. Only the active segment is being written to and the others are immutable. The concept of a Chunk sounds quite similar to our log segments.
>From what I can tell reading the KIP, the main difference is that a Chunk can have its own assignment and therefore be replicated across different brokers. > Horizontal scalability: the data was distributed more evenly to brokers in cluster. Also achieving a more flexible resource allocation. I think this is only true in cases where we have a small number of partitions with a large amount of data. I have certainly seen cases where a small number of partitions can cause trouble with balancing the cluster. The idea of shuffling around older data in order to spread out the load is interesting. It does seem like it would increase the complexity of the client a bit when it comes to consuming the old data. Usually the client can just read from a single replica from the beginning of the log to the end. With this proposal, the client would need to hop around between replicas as it crossed the chunk boundaries. > Better load balancing: The read of partition data, especially early data can be distributed to more nodes other than just leader nodes. As you know, this is already possible with KIP-392. I guess the idea with the chunks is that clients would be reading older data from less busy brokers (i.e., brokers which are not the leader, or perhaps not even a follower of the active chunk). I’m not sure this would always result in better load balancing. It seems a bit situational. > Increased fault tolerance: failure of leader node will not impact read older data. I don’t think this proposal changes the fault tolerance. A failure of a leader results in a failover to a follower. If a client is consuming using KIP-392, a leader failure will not affect the consumption (besides updating the clients metadata). -- I guess I'm missing a key point here. What problem is this trying to solve? Is it a solution for the "single partition" problem? (i.e., a topic with one partition and a lot of data) Thanks! David A On Tue, Dec 31, 2024 at 3:24 PM De Gao <d...@live.co.uk> wrote: > Thanks for the comments. I have updated the proposal to compare with > tiered storage and fetch from replica. Please check. > > Thanks. > > On 11 December 2024 08:51:43 GMT, David Jacot <dja...@confluent.io.INVALID> > wrote: > >Hi, > > > >Thanks for the KIP. The community is pretty busy with the Apache Kafka 4.0 > >release so I suppose that no one really had the time to engage in > reviewing > >the KIP yet. Sorry for this! > > > >I just read the motivation section. I think that it is an interesting > idea. > >However, I wonder if this is still needed now that we have tier storage in > >place. One of the big selling points of tier storage was that clusters > >don't have to replicate tiered data anymore. Could you perhaps extend the > >motivation of the KIP to include tier storage in the reflexion? > > > >Best, > >David > > > >On Tue, Dec 10, 2024 at 10:46 PM De Gao <d...@live.co.uk> wrote: > > > >> Hi All: > >> > >> There were no discussion in the past week. Just want to double check if > I > >> missed anything? > >> What should be the expectations on KIP discussion? > >> > >> Thank you! > >> > >> De Gao > >> > >> On 1 December 2024 19:36:37 GMT, De Gao <d...@live.co.uk> wrote: > >> >Hi All: > >> > > >> >I would like to start the discussion of KIP-1114 Introducing Chunk in > >> Partition. > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1114%3A+Introducing+Chunk+in+Partition > >> >This KIP is complicated so I expect discussion will take longer time. > >> > > >> >Thank you in advance. > >> > > >> >De Gao > >> > -- David Arthur