Hi Alyssa, For scenario 2: Yes, this is the expected state for a cluster which upgraded from pre-KIP-1170 to post-KIP-1170.
For scenario 3: Yes, this would be the expected state on startup for newly provisioned clusters that implement KIP-1170. I think this is also the case if a user's existing cluster upgraded to post-1170 and formatted again without `--ignore-formatted`, however unlikely that might be. For scenario 4: We say this is impossible from a formatting perspective because the implementation of KIP-1170 no longer creates the `bootstrap.checkpoint` file, and will instead write the metadata records to the `0-0.checkpoint` when formatting. Therefore, it should be impossible to be in a state where both `bootstrap.checkpoint` exists and `0-0.checkpoint` has metadata records. If a user upgrades to post-1170, and they do not format, they are in scenario 2. If a user upgrades to post-1170, and formats again, they are in scenario 3. > For instance, it seems I should interpret the above scenario as when the > cluster existed pre-KIP-1170, but it's possible for the controller not to > load a snapshot at epoch 0 and offset 0 solely because that snapshot was > deleted due to retention. I believe the fact the `0-0.checkpoint` has been deleted should tell us we have already written the bootstrap records to the log, either from the `0-0.checkpoint` or the `bootstrap.checkpoint`, depending on what software version we formatted with. This is because in order to delete the `0-0.checkpoint`, there must exist at least one `.checkpoint` file with a higher offset. > If the controller doesn't load a snapshot at epoch 0 and offset 0 I think this language means if we're in a pre-KIP-1170 scenario (i.e. scenario 2 from above), but correct me if I am wrong. Best, Kevin On Wed, Aug 27, 2025 at 5:12 PM Alyssa Huang <ahu...@confluent.io.invalid> wrote: > Thanks Jose and Kevin, > > Wanted to confirm my understanding, for the scenarios under the > Compatibility section: > > 2. bootstrap.checkpoint exist with metadata records and the zero checkpoint > > exists but doesn't contain any metadata records - In this case the > > controller will behave as it does today. The controller will be able to > > identify this case because RaftClient.Listener#handleLoadSnapshot will > ask > > to load the zero checkpoint but it will be empty, no metadata records. > > This is the expected state for a cluster which upgraded from pre-KIP-1170 > to post-KIP-1170? > > > > 3. bootstrap.checkpoint doesn't exist and the zero checkpoint exists with > > metadata records - In this case the controller will use the zero > > checkpoint's metadata records to and write them to the log in a > transaction > > or single atomic batch like the controller does pre-KIP-1170. > > And this would be the expected state on startup for newly provisioned > clusters that implement KIP-1170? > > 4. bootstrap.checkpoint exist and the zero checkpoint exists with metadata > > records - This should not be possible from a formatting point of view but > > the active controller will handle this case the same as bullet 3 but with > > the addition of writing a WARN message to the controller log. > > Can we explain why this should not be possible? It might not be obvious to > readers > > Also, what do you think about breaking the Proposed Changes section into > two examples? How new changes impact newly provisioned clusters vs existing > clusters (with bootstrap.checkpoint). > > > If the controller doesn't load a snapshot at epoch 0 and offset 0, the > > controller will load the bootstrap.checkpoint and rewrite the > bootstrapping > > record to the cluster metadata partition if they haven't been > successfully > > written in the past. > > For instance, it seems I should interpret the above scenario as when the > cluster existed pre-KIP-1170, but it's possible for the controller not to > load a snapshot at epoch 0 and offset 0 solely because that snapshot was > deleted due to retention. > > Best, > Alyssa > > On Wed, Aug 27, 2025 at 1:23 PM José Armando García Sancio > <jsan...@confluent.io.invalid> wrote: > > > Hi all, > > > > I would like to start a discussion on KIP-1170: Unify cluster metadata > > bootstrapping: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1170%3A+Unify+cluster+metadata+bootstrapping > > > > This KIP unifies the controller's bootstrap checkpoint and KRaft zero > > checkpoint by moving the starting metadata from bootstrap.checkpoint > > to the zero checkpoint. > > > > Thanks, > > > > -- > > -José > > >