Hi Alyssa,

For scenario 2: Yes, this is the expected state for a cluster which
upgraded from pre-KIP-1170
to post-KIP-1170.

For scenario 3: Yes, this would be the expected state on startup for newly
provisioned
clusters that implement KIP-1170. I think this is also the case if a user's
existing cluster upgraded to post-1170 and formatted again without
`--ignore-formatted`, however unlikely that might be.

For scenario 4: We say this is impossible from a formatting perspective
because the implementation of KIP-1170 no longer creates the
`bootstrap.checkpoint` file, and will instead write the metadata records to
the `0-0.checkpoint` when formatting. Therefore, it should be impossible to
be in a state where both `bootstrap.checkpoint` exists and `0-0.checkpoint`
has metadata records. If a user upgrades to post-1170, and they do not
format, they are in scenario 2. If a user upgrades to post-1170, and
formats again, they are in scenario 3.

> For instance, it seems I should interpret the above scenario as when the
> cluster existed pre-KIP-1170, but it's possible for the controller not to
> load a snapshot at epoch 0 and offset 0 solely because that snapshot was
> deleted due to retention.

I believe the fact the `0-0.checkpoint` has been deleted should tell us we
have already written the bootstrap records to the log, either from the
`0-0.checkpoint` or the `bootstrap.checkpoint`, depending on what software
version we formatted with. This is because in order to delete the
`0-0.checkpoint`, there must exist at least one `.checkpoint` file with a
higher offset.

> If the controller doesn't load a snapshot at epoch 0 and offset 0
I think this language means if we're in a pre-KIP-1170 scenario (i.e.
scenario 2 from above), but correct me if I am wrong.

Best,
Kevin

On Wed, Aug 27, 2025 at 5:12 PM Alyssa Huang <ahu...@confluent.io.invalid>
wrote:

> Thanks Jose and Kevin,
>
> Wanted to confirm my understanding, for the scenarios under the
> Compatibility section:
>
> 2. bootstrap.checkpoint exist with metadata records and the zero checkpoint
> > exists but doesn't contain any metadata records - In this case the
> > controller will behave as it does today. The controller will be able to
> > identify this case because RaftClient.Listener#handleLoadSnapshot will
> ask
> > to load the zero checkpoint but it will be empty, no metadata records.
>
> This is the expected state for a cluster which upgraded from pre-KIP-1170
> to post-KIP-1170?
>
>
> > 3. bootstrap.checkpoint doesn't exist and the zero checkpoint exists with
> > metadata records - In this case the controller will use the zero
> > checkpoint's metadata records to and write them to the log in a
> transaction
> > or single atomic batch like the controller does pre-KIP-1170.
>
> And this would be the expected state on startup for newly provisioned
> clusters that implement KIP-1170?
>
> 4. bootstrap.checkpoint exist and the zero checkpoint exists with metadata
> > records - This should not be possible from a formatting point of view but
> > the active controller will handle this case the same as bullet 3 but with
> > the addition of writing a WARN message to the controller log.
>
> Can we explain why this should not be possible? It might not be obvious to
> readers
>
> Also, what do you think about breaking the Proposed Changes section into
> two examples? How new changes impact newly provisioned clusters vs existing
> clusters (with bootstrap.checkpoint).
>
> > If the controller doesn't load a snapshot at epoch 0 and offset 0, the
> > controller will load the bootstrap.checkpoint and rewrite the
> bootstrapping
> > record to the cluster metadata partition if they haven't been
> successfully
> > written in the past.
>
> For instance, it seems I should interpret the above scenario as when the
> cluster existed pre-KIP-1170, but it's possible for the controller not to
> load a snapshot at epoch 0 and offset 0 solely because that snapshot was
> deleted due to retention.
>
> Best,
> Alyssa
>
> On Wed, Aug 27, 2025 at 1:23 PM José Armando García Sancio
> <jsan...@confluent.io.invalid> wrote:
>
> > Hi all,
> >
> > I would like to start a discussion on KIP-1170: Unify cluster metadata
> > bootstrapping:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1170%3A+Unify+cluster+metadata+bootstrapping
> >
> > This KIP unifies the controller's bootstrap checkpoint and KRaft zero
> > checkpoint by moving the starting metadata from bootstrap.checkpoint
> > to the zero checkpoint.
> >
> > Thanks,
> >
> > --
> > -José
> >
>

Reply via email to