Hey Colin,

I want to follow up to Andrew questions and your replies.

First, yes both KIP-932 and KIP-1071 are going to use "Features".

 - https://github.com/apache/kafka/pull/19293
 - https://github.com/apache/kafka/pull/19509

Don't think it's mentioned on the KIPs (at least not for KIP-1071). If it's required to mention it on the KIP, happy to update KIP-1071 (would leave it to Andrew to update KIP-932 if necessary). It's a recent change, and was not considered originally for KIP-1071.


To be clear, this KIP is not a generic downgrade mechanism for any KIP-584 feature. It's very specifically for metadata.version.

Isn't the point of downgrading the metadata version to downgrade the broker?

If yes, I agree to Andrew, that w/o a proper downgrade path for everything, it seems to be quite limited what users can achieve with this KIP?

After KIP-923 and KIP-1071 are enable by default, and client applications use these features, there is no way to actually downgrade the brokers without breaking these application. To what do we gain my supporting metadata version downgrade, if broker cannot be downgraded anyway?


Thoughts?


-Matthias




On 4/25/25 2:42 PM, Colin McCabe wrote:
On Fri, Apr 25, 2025, at 14:26, José Armando García Sancio wrote:
Hi Colin,

Thanks for the KIP. I have a few questions and comments.

The "initiating the downgrade" section states "It will also check that
the requested metadata version is compatible with all the other
KIP-584 features that are enabled." What do you mean by this?


Hi Jose,

So, in Kafka version 4.0, there are no KIP-584 features that rely on a specific 
metadata version. However, there could be some in the future. This is just a 
statement that if we do have these kind of cross-feature dependencies in the 
future, we will enforce them.

In the same section, you have "If the metadata version is too old to
support KIP-919 controller registrations (pre 3.7.0-IV0), we will
simply assume that the controllers do support downgrade, because we
cannot check. This is similar to how metadata version upgrade is
handled in the absence of controller registrations." What happens if
this assumption is incorrect, the controller doesn't support
downgrade? Is there something that the user can do and we can document
to mitigate any issues that result from this?

This is the scenario where some controllers are running the 3.6.0 release or 
older, and some controllers are running 4.2 or newer, on a very old metadata 
version without controller registrations. There really isn't anything we can do 
to detect this since by definition, the pre-3.7-IV3 metadata version doesn't 
have controller registrations.

Any problems that result from this should be clearable by restarting the 
ancient controller processes, though (since they will then load the snapshot at 
the older MV)


In the same section, you have "If the updateFeatures RPC specified
multiple features, the metadata version downgrade record will be
emitted last." Is there a reason why this is required? I think that
all of the records would be in the same record batch so it is not
clear to me why you need the invariant that the metadata version is
last.


It's required because in theory the other features could depend on 
metadata.version. So we want to avoid the potential invalid state where MV has 
been downgraded but the features which depend on it have not.

In the "handling the downgrade" section, you have "the MetadataDelta
should contain only the things that changed since the previous
snapshot. (There are a few cases where things are appearing in
MetadataDelta even if they haven't changed since the last snapshot –
we should fix this.)" Do we have bug jira for this? You also say that
we _should_ fix this. Do we need to fix this for this design to be
correct or is this a performance/optimization issue?

This is about performance since otherwise we end up with things like a delta 
which contains every partition, etc.

This is particularly a problem with ReplicaManager -- we don't want to restart 
all the fetchers unless we have to do that.


In the same section, you have "When a broker or controller replays the
new FeatureLevelRecord telling it to change the metadata.version, it
will immediately trigger a writing a __cluster_metadata snapshot with
the new metadata version." I assume that a new snapshot will be
generated only when the metadata version decreases, is that correct?
Can you explain how a change in metadata version will be detected? For
example, the FeatureLevelRecord may be in the cluster metadata
checkpoint. In that case a new checkpoint doesn't need to be
generated. I get the impression that this solution depends on the
KRaft listener always returning the latest checkpoint in the
RaftClient#handleLoadSnapshot. If so, should we make that explicit?


Hmm, I'm a bit confused about the question. The current Raft interfaces do 
distinguish between records we load from a snapshot and records we load from 
the log. So there is no risk of confusing the FeatureLevelRecord we loaded from 
the latest snapshot with the FeatureLevelRecord we loaded from the log.


In the "lossy versus lossless downgrades" section, you have "we will
check to see if anything was lost when writing the image at the lower
metadata version. If there is, we will abort the downgrade process."
Should we be more explicit and say that the active controller will
perform the check. When you say "abort" do you mean that the active
controller will return an INVALID_UPDATE_VERSION error for the
UPDATE_FEATURES RPC and no FeatureLevelRecord will be written to the
cluster metadata partition?


The assumption is that the active controller does all checks and returns all 
return codes


Given that checkpoint generation is asynchronous from committing the
new metadata version, should we have metrics (or a different
mechanism) that the user can monitor to determine when it is safe to
downgrade the software version of a node?


That's a fair point. We should have some documentation about waiting for a 
snapshot file to appear with a date later than the RPC date, I guess.

best,
Colin

Thanks,
--
-José

Reply via email to