Hello Abhijeet and Jun, I have been mulling this KIP over a bit more in recent days!
re: Jun I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in retrospect it should have been fairly obvious. I would need to go an update KIP-1005 myself then, thank you for giving the useful reference! 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and fetch from latest-tiered-offset + 1 only for new replicas (or replicas which experienced a disk failure) to decrease the time a partition spends in under-replicated state. In other words, a follower which has just fallen out of ISR, but has local data will continue using today's Tiered Storage replication protocol (i.e. fetching from earliest-local). I further believe he has taken this approach so that local state of replicas which have just fallen out of ISR isn't forcefully wiped thus leading to situation 1. Abhijeet, have I understood (and summarised) what you are proposing correctly? 5. I think in today's Tiered Storage we know the leader's log-start-offset from the FetchResponse and we can learn its local-log-start-offset from the ListOffsets by asking for earliest-local timestamp (-4). But granted, this ought to be added as an additional API call in the KIP. re: Abhijeet 101. I am still a bit confused as to why you want to include a new offset (i.e. pending-upload-offset) when you yourself mention that you could use an already existing offset (i.e. last-tiered-offset + 1). In essence, you end your Motivation with "In this KIP, we will focus only on the follower fetch protocol using the *last-tiered-offset*" and then in the following sections you talk about pending-upload-offset. I understand this might be classified as an implementation detail, but if you introduce a new offset (i.e. pending-upload-offset) you have to make a change to the ListOffsets API (i.e. introduce -6) and thus document it in this KIP as such. However, the last-tiered-offset ought to already be exposed as part of KIP-1005 (under implementation). Am I misunderstanding something here? Best, Christo On Thu, 4 Apr 2024 at 19:37, Jun Rao <j...@confluent.io.invalid> wrote: > Hi, Abhijeet, > > Thanks for the KIP. Left a few comments. > > 1. "A drawback of using the last-tiered-offset is that this new follower > would possess only a limited number of locally stored segments. Should it > ascend to the role of leader, there is a risk of needing to fetch these > segments from the remote storage, potentially impacting broker > performance." > Since we support consumers fetching from followers, this is a potential > issue on the follower side too. In theory, it's possible for a segment to > be tiered immediately after rolling. In that case, there could be very > little data after last-tiered-offset. It would be useful to think through > how to address this issue. > > 2. ListOffsetsRequest: > 2.1 Typically, we need to bump up the version of the request if we add a > new value for timestamp. See > > https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 > . > 2.2 Since this changes the inter broker request protocol, it would be > useful to have a section on upgrade (e.g. new IBP/metadata.version). > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the > last-tiered-offset from the leader, and make a separate leader call to > fetch leader epoch for the following offset." > Why do we need to make a separate call for the leader epoch? > ListOffsetsResponse include both the offset and the corresponding epoch. > > 4. "Check if the follower replica is empty and if the feature to use > last-tiered-offset is enabled." > Why do we need to check if the follower replica is empty? > > 5. It can be confirmed by checking if the leader's Log-Start-Offset is the > same as the Leader's Local-Log-Start-Offset. > How does the follower know Local-Log-Start-Offset? > > Jun > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar <abhijeet.cse....@gmail.com > > > wrote: > > > Hi Christo, > > > > Thanks for reviewing the KIP. > > > > The follower needs the earliest-pending-upload-offset (and the > > corresponding leader epoch) from the leader. > > This is the first offset the follower will have locally. > > > > Regards, > > Abhijeet. > > > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov <christolo...@gmail.com> > > wrote: > > > > > Heya! > > > > > > First of all, thank you very much for the proposal, you have explained > > the > > > problem you want solved very well - I think a faster bootstrap of an > > empty > > > replica is definitely an improvement! > > > > > > For my understanding, which concrete offset do you want the leader to > > give > > > back to a follower - earliest-pending-upload-offset or the > > > latest-tiered-offset? If it is the second, then I believe KIP-1005 > ought > > to > > > already be exposing that offset as part of the ListOffsets API, no? > > > > > > Best, > > > Christo > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar < > abhijeet.cse....@gmail.com > > > > > > wrote: > > > > > > > Hi All, > > > > > > > > I have created KIP-1023 to introduce follower fetch from tiered > offset. > > > > This feature will be helpful in significantly reducing Kafka > > > > rebalance/rebuild times when the cluster is enabled with tiered > > storage. > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset > > > > > > > > Feedback and suggestions are welcome. > > > > > > > > Regards, > > > > Abhijeet. > > > > > > > > > >