Hi Colin, Thanks for the KIP. I have a few questions and comments.
The "initiating the downgrade" section states "It will also check that the requested metadata version is compatible with all the other KIP-584 features that are enabled." What do you mean by this? In the same section, you have "If the metadata version is too old to support KIP-919 controller registrations (pre 3.7.0-IV0), we will simply assume that the controllers do support downgrade, because we cannot check. This is similar to how metadata version upgrade is handled in the absence of controller registrations." What happens if this assumption is incorrect, the controller doesn't support downgrade? Is there something that the user can do and we can document to mitigate any issues that result from this? In the same section, you have "If the updateFeatures RPC specified multiple features, the metadata version downgrade record will be emitted last." Is there a reason why this is required? I think that all of the records would be in the same record batch so it is not clear to me why you need the invariant that the metadata version is last. In the "handling the downgrade" section, you have "the MetadataDelta should contain only the things that changed since the previous snapshot. (There are a few cases where things are appearing in MetadataDelta even if they haven't changed since the last snapshot – we should fix this.)" Do we have bug jira for this? You also say that we _should_ fix this. Do we need to fix this for this design to be correct or is this a performance/optimization issue? In the same section, you have "When a broker or controller replays the new FeatureLevelRecord telling it to change the metadata.version, it will immediately trigger a writing a __cluster_metadata snapshot with the new metadata version." I assume that a new snapshot will be generated only when the metadata version decreases, is that correct? Can you explain how a change in metadata version will be detected? For example, the FeatureLevelRecord may be in the cluster metadata checkpoint. In that case a new checkpoint doesn't need to be generated. I get the impression that this solution depends on the KRaft listener always returning the latest checkpoint in the RaftClient#handleLoadSnapshot. If so, should we make that explicit? In the "lossy versus lossless downgrades" section, you have "we will check to see if anything was lost when writing the image at the lower metadata version. If there is, we will abort the downgrade process." Should we be more explicit and say that the active controller will perform the check. When you say "abort" do you mean that the active controller will return an INVALID_UPDATE_VERSION error for the UPDATE_FEATURES RPC and no FeatureLevelRecord will be written to the cluster metadata partition? Given that checkpoint generation is asynchronous from committing the new metadata version, should we have metrics (or a different mechanism) that the user can monitor to determine when it is safe to downgrade the software version of a node? Thanks, -- -José