Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

David Arthur Thu, 02 Jan 2025 08:53:26 -0800

Hey De Gao, thanks for the KIP!

As you’re probably aware, a Partition is a logical construct in Kafka. A
broker hosts a partition which is composed of physical log segments. Only
the active segment is being written to and the others are immutable. The
concept of a Chunk sounds quite similar to our log segments.

>From what I can tell reading the KIP, the main difference is that a Chunk
can have its own assignment and therefore be replicated across different
brokers.

> Horizontal scalability: the data was distributed more evenly to brokers
in cluster. Also achieving a more flexible resource allocation.

I think this is only true in cases where we have a small number of
partitions with a large amount of data. I have certainly seen cases where a
small number of partitions can cause trouble with balancing the cluster.

The idea of shuffling around older data in order to spread out the load is
interesting. It does seem like it would increase the complexity of the
client a bit when it comes to consuming the old data. Usually the client
can just read from a single replica from the beginning of the log to the
end. With this proposal, the client would need to hop around between
replicas as it crossed the chunk boundaries.

> Better load balancing: The read of partition data, especially early data
can be distributed to more nodes other than just leader nodes.

As you know, this is already possible with KIP-392. I guess the idea with
the chunks is that clients would be reading older data from less busy
brokers (i.e., brokers which are not the leader, or perhaps not even a
follower of the active chunk). I’m not sure this would always result in
better load balancing. It seems a bit situational.

> Increased fault tolerance: failure of leader node will not impact read
older data.

I don’t think this proposal changes the fault tolerance. A failure of a
leader results in a failover to a follower. If a client is consuming using
KIP-392, a leader failure will not affect the consumption (besides updating
the clients metadata).

--

I guess I'm missing a key point here. What problem is this trying to solve?
Is it a solution for the "single partition" problem? (i.e., a topic with
one partition and a lot of data)

Thanks!
David A

On Tue, Dec 31, 2024 at 3:24 PM De Gao <d...@live.co.uk> wrote:

> Thanks for the comments. I have updated the proposal to compare with
> tiered storage and fetch from replica. Please check.
>
> Thanks.
>
> On 11 December 2024 08:51:43 GMT, David Jacot <dja...@confluent.io.INVALID>
> wrote:
> >Hi,
> >
> >Thanks for the KIP. The community is pretty busy with the Apache Kafka 4.0
> >release so I suppose that no one really had the time to engage in
> reviewing
> >the KIP yet. Sorry for this!
> >
> >I just read the motivation section. I think that it is an interesting
> idea.
> >However, I wonder if this is still needed now that we have tier storage in
> >place. One of the big selling points of tier storage was that clusters
> >don't have to replicate tiered data anymore. Could you perhaps extend the
> >motivation of the KIP to include tier storage in the reflexion?
> >
> >Best,
> >David
> >
> >On Tue, Dec 10, 2024 at 10:46 PM De Gao <d...@live.co.uk> wrote:
> >
> >> Hi All:
> >>
> >> There were no discussion in the past week. Just want to double check if
> I
> >> missed anything?
> >> What should be the expectations on KIP discussion?
> >>
> >> Thank you!
> >>
> >> De Gao
> >>
> >> On 1 December 2024 19:36:37 GMT, De Gao <d...@live.co.uk> wrote:
> >> >Hi All:
> >> >
> >> >I would like to start the discussion of KIP-1114 Introducing Chunk in
> >> Partition.
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1114%3A+Introducing+Chunk+in+Partition
> >> >This KIP is complicated so I expect discussion will take longer time.
> >> >
> >> >Thank you in advance.
> >> >
> >> >De Gao
> >>
>

-- 
David Arthur

Re: [DISCUSS] KIP-1114 Introducing Chunk in Partition

Reply via email to