Thanks, I think having the leader election to consider the metadata.version would be a good idea moving forward, but we do not need to include in this KIP.
On Tue, Nov 16, 2021 at 6:37 AM David Arthur <david.art...@confluent.io.invalid> wrote: > Guozhang, > > 1. By requiring a majority of all controller nodes to support the version > selected by the leader, we increase the likelihood that the next leader > will also support it. We can't guarantee that all nodes definitely support > the selected metadata.version because there could always be an offline or > partitioned peer that is running old software. > > If a controller running old software manages elected, we hit this case: > > In the unlikely event that an active controller encounters an unsupported > > metadata.version, it should resign and terminate. > > > So, given this, we should be able to eventually elect a controller that > does support the metadata.version. > > > Consider controllers C1, C2, C3 with this arrangement: > > Node SoftwareVer MaxMetadataVer > C1 3.2 1 > C2 3.3 4 > C3 3.3 4 > > If the current metadata.version is 1 and we're trying to upgrade to 4, we > would allow it since two of the three nodes support it. If any one > controller is down while we are attempting an upgrade, we would require > that both of remaining alive nodes support the target metadata.version > (since we require a majority of _all_ controller nodes, not just alive > ones). > > An interesting case here is how to handle a version update if a majority of > the quorum supports it, but the leader doesn't. For example, if C1 was the > leader and an upgrade to version 4 was requested. Maybe this would trigger > C1 to resign and inform the client to retry the update later. > > We may eventually want to consider the metadata.version when electing a > leader, but as long as we have the majority requirement before committing a > new metadata.version, I think we should be safe. > > -David > > On Mon, Nov 15, 2021 at 12:52 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > Thanks David, > > > > 1. Got it. One thing I'm still not very clear is why it's sufficient to > > select a metadata.version which is supported by majority of the quorum, > but > > not the whole quorum (i.e. choosing the lowest version among all the > quorum > > members)? Since the leader election today does not take this value into > > consideration, we are not guaranteed that newly selected leaders would > > always be able to recognize and support the initialized metadata.version > > right? > > > > 2. Yeah I think I agree the behavior-but-not-RPC-change scenario is > beyond > > the scope of this KIP, we can defer it to later discussions. > > > > On Mon, Nov 15, 2021 at 8:13 AM David Arthur > > <david.art...@confluent.io.invalid> wrote: > > > > > Guozhang, thanks for the review! > > > > > > 1, As we've defined it so far, the initial metadata.version is set by > an > > > operator via the "kafka-storage.sh" tool. It would be possible for > > > different values to be selected, but only the quorum leader would > commit > > a > > > FeatureLevelRecord with the version they read locally. See the above > > reply > > > to Jun's question for a little more detail. > > > > > > We need to enable the KRaft RPCs regardless of metadata.version (vote, > > > heartbeat, fetch, etc) so that the quorum can be formed, a leader can > be > > > elected, and followers can learn about the selected metadata.version. I > > > believe the quorum startup procedure would go something like: > > > > > > * Controller nodes start, read their logs, begin leader election > > > * While the elected node is becoming leader (in > > > QuorumMetaLogListener#handleLeaderChange), initialize metadata.version > if > > > necessary > > > * Followers replicate the FeatureLevelRecord > > > > > > This (probably) means the quorum peers must continue to rely on > > ApiVersions > > > to negotiate compatible RPC versions and these versions cannot depend > on > > > metadata.version. > > > > > > Does this make sense? > > > > > > > > > 2, ApiVersionResponse includes the broker's supported feature flags as > > well > > > as the cluster-wide finalized feature flags. We probably need to add > > > something like the feature flag "epoch" to this response payload in > order > > > to see which broker is most up to date. > > > > > > If the new feature flag version included RPC changes, we are helped by > > the > > > fact that a client won't attempt to use the new RPC until it has > > discovered > > > a broker that supports it via ApiVersions. The problem is more > difficult > > > for cases like you described where the feature flag upgrade changes the > > > behavior of the broker, but not its RPCs. This is actually the same > > problem > > > as upgrading the IBP. During a rolling restart, clients may hit > different > > > brokers with different capabilities and not know it. > > > > > > This probably warrants further investigation, but hopefully you agree > it > > is > > > beyond the scope of this KIP :) > > > > > > -David > > > > > > > > > On Mon, Nov 15, 2021 at 10:26 AM David Arthur < > david.art...@confluent.io > > > > > > wrote: > > > > > > > Jun, thanks for the comments! > > > > > > > > 16, When a new cluster is deployed, we don't select the highest > > available > > > > metadata.version, but rather the quorum leader picks a bootstrap > > version > > > > defined in meta.properties. As mentioned earlier, we should add > > > validation > > > > here to ensure a majority of the followers could support this version > > > > before initializing it. This would avoid a situation where a failover > > > > results in a new leader who can't support the selected > > metadata.version. > > > > > > > > Thinking a bit more on this, we do need to define a state where we're > > > > running newer software, but we don't have the feature flag set. This > > > could > > > > happen if we were running an older IBP that did not support KIP-778. > > > > Following on this, it doesn't seem too difficult to consider a case > > where > > > > the IBP has been upgraded, but we still have not finalized a > > > > metadata.version. Here are some possible version combinations > (assuming > > > > KIP-778 is added to Kafka 3.2): > > > > > > > > Case Software IBP metadata.version effective version > > > > -------------------------------------------------------------- > > > > A 3.1 3.1 - 0 software too old for > > > > feature flag > > > > B 3.2 3.1 - 0 feature flag > supported, > > > > but IBP too old > > > > C 3.2 3.2 - 0 feature flag > supported, > > > > but not initialized > > > > D 3.2 3.2 1 1 feature flag > > initialized > > > > to 1 (via operator or bootstrap process) > > > > ... > > > > E 3.8 3.1 - 0 ...IBP too old > > > > F 3.8 3.2 - 0 ...not initialized > > > > G 3.8 3.2 4 4 > > > > > > > > > > > > Here I'm defining version 0 as "no metadata.version set". So back to > > your > > > > comment, I think the KIP omits discussion of case C from the above > > table > > > > which I can amend. Does that cover your concerns, or am I missing > > > something > > > > else? > > > > > > > > > > > > > it's inconvenient for a user to manually upgrade every feature > > version > > > > > > > > For this, we would probably want to extend the capabilities of > > KIP-584. I > > > > don't think anything we've discussed for KIP-778 will preclude us > from > > > > adding some kind of auto-upgrade in the future. > > > > > > > > 21, "disable" sounds good to me. I agree "delete feature-x" sounds a > > bit > > > > weird. > > > > > > > > > > > > > > > > On Mon, Nov 8, 2021 at 8:47 PM Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > >> Hello David, > > > >> > > > >> Thanks for the very nice writeup! It helped me a lot to refresh my > > > memory > > > >> on KIP-630/590/584 :) > > > >> > > > >> I just had two clarification questions after reading through the > KIP: > > > >> > > > >> 1. For the initialization procedure, do we guarantee that all the > > quorum > > > >> nodes (inactive candidates and leaders, a.k.a. controllers) would > > always > > > >> initialize with the same metadata.version? If yes, how is that > > > guaranteed? > > > >> More specifically, when a quorum candidate is starting up, would it > > > avoid > > > >> handling any controller requests (including APIVersionsRequest) from > > its > > > >> peers in the quorum until it completes reading the local log? And > even > > > if > > > >> yes, what would happen if there's no FeatureLevelRecord found, and > > > >> different nodes read different values from their local > meta.properties > > > >> file > > > >> or initializing from their binary's hard-code values? > > > >> > > > >> 2. This is not only for the metadata.version itself, but for general > > > >> feature.versions: when a version is upgraded / downgraded, since > > brokers > > > >> would read the FeatureLevelRecord at their own pace, there will > always > > > be > > > >> a > > > >> race window when some brokers has processed the record and completed > > the > > > >> upgrade while others have not, hence may behave differently --- I'm > > > >> thinking for the future like the specific replica selector to allow > > > >> fetching from follower, and even more advanced selectors --- i.e. > > should > > > >> we > > > >> consider letting clients to only talk to brokers with the highest > > > metadata > > > >> log offsets for example? > > > >> > > > >> > > > >> Guozhang > > > >> > > > >> > > > >> > > > >> > > > >> On Fri, Nov 5, 2021 at 3:18 PM Jun Rao <j...@confluent.io.invalid> > > > wrote: > > > >> > > > >> > Hi, David, > > > >> > > > > >> > Thanks for the reply. > > > >> > > > > >> > 16. My first concern is that the KIP picks up meta.version > > > >> inconsistently > > > >> > during the deployment. If a new cluster is started, we pick up the > > > >> highest > > > >> > version. If we upgrade, we leave the feature version unchanged. > > > >> > Intuitively, it seems that independent of how a cluster is > deployed, > > > we > > > >> > should always pick the same feature version. I think we need to > > think > > > >> > this through in this KIP. My second concern is that as a > particular > > > >> version > > > >> > matures, it's inconvenient for a user to manually upgrade every > > > feature > > > >> > version. As long as we have a path to achieve that in the future, > we > > > >> don't > > > >> > need to address that in this KIP. > > > >> > > > > >> > 21. "./kafka-features.sh delete": Deleting a feature seems a bit > > weird > > > >> > since the logic is always there. Would it be better to use > disable? > > > >> > > > > >> > Jun > > > >> > > > > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur > > > >> > <david.art...@confluent.io.invalid> wrote: > > > >> > > > > >> > > Colin and Jun, thanks for the additional comments! > > > >> > > > > > >> > > Colin: > > > >> > > > > > >> > > > We've been talking about having an automated RPC compatibility > > > >> checker > > > >> > > > > > >> > > Do we have a way to mark fields in schemas as deprecated? It can > > > stay > > > >> in > > > >> > > the RPC, it just complicates the logic a bit. > > > >> > > > > > >> > > > It would be nice if the active controller could validate that > a > > > >> > majority > > > >> > > of the quorum could use the proposed metadata.version. The > active > > > >> > > controller should have this information, right? If we don't have > > > >> recent > > > >> > > information from a quorum of voters, we wouldn't be active. > > > >> > > > > > >> > > I believe we should have this information from the > > > >> ApiVersionsResponse. > > > >> > It > > > >> > > would be good to do this validation to avoid a situation where a > > > >> > > quorum leader can't be elected due to unprocessable records. > > > >> > > > > > >> > > > Do we need delete as a command separate from downgrade? > > > >> > > > > > >> > > I think from an operator's perspective, it is nice to > distinguish > > > >> between > > > >> > > changing a feature flag and unsetting it. It might be surprising > > to > > > an > > > >> > > operator to see the flag's version set to nothing when they > > > requested > > > >> the > > > >> > > downgrade to version 0 (or less). > > > >> > > > > > >> > > > it seems like we should spell out that metadata.version begins > > at > > > 1 > > > >> in > > > >> > > KRaft clusters > > > >> > > > > > >> > > I added this text: > > > >> > > > > > >> > > Introduce an IBP version to indicate the lowest software version > > > that > > > >> > > > supports *metadata.version*. Below this IBP, the > > > *metadata.version* > > > >> is > > > >> > > > undefined and will not be examined. At or above this IBP, the > > > >> > > > *metadata.version* must be *0* for ZooKeeper clusters and will > > be > > > >> > > > initialized as *1* for KRaft clusters. > > > >> > > > > > >> > > > > > >> > > > We probably also want an RPC implemented by both brokers and > > > >> > controllers > > > >> > > that will reveal the min and max supported versions for each > > feature > > > >> > level > > > >> > > supported by the server > > > >> > > > > > >> > > This is available in ApiVersionsResponse (we include the > server's > > > >> > supported > > > >> > > features as well as the cluster's finalized features) > > > >> > > > > > >> > > -------- > > > >> > > > > > >> > > Jun: > > > >> > > > > > >> > > 12. I've updated the KIP with AdminClient changes > > > >> > > > > > >> > > 14. You're right, it looks like I missed a few sections > regarding > > > >> > snapshot > > > >> > > generation. I've corrected it > > > >> > > > > > >> > > 16. This feels more like an enhancement to KIP-584. I agree it > > could > > > >> be > > > >> > > useful, but perhaps we could address it separately from KRaft > > > >> upgrades? > > > >> > > > > > >> > > 20. Indeed snapshots are not strictly necessary during an > upgrade, > > > >> I've > > > >> > > reworded this > > > >> > > > > > >> > > > > > >> > > Thanks! > > > >> > > David > > > >> > > > > > >> > > > > > >> > > On Thu, Nov 4, 2021 at 6:51 PM Jun Rao <j...@confluent.io.invalid > > > > > >> wrote: > > > >> > > > > > >> > > > Hi, David, Jose and Colin, > > > >> > > > > > > >> > > > Thanks for the reply. A few more comments. > > > >> > > > > > > >> > > > 12. It seems that we haven't updated the AdminClient > > accordingly? > > > >> > > > > > > >> > > > 14. "Metadata snapshot is generated and sent to the other > > inactive > > > >> > > > controllers and to brokers". I thought we wanted each broker > to > > > >> > generate > > > >> > > > its own snapshot independently? If only the controller > generates > > > the > > > >> > > > snapshot, how do we force other brokers to pick it up? > > > >> > > > > > > >> > > > 16. If a feature version is new, one may not want to enable it > > > >> > > immediately > > > >> > > > after the cluster is upgraded. However, if a feature version > has > > > >> been > > > >> > > > stable, requiring every user to run a command to upgrade to > that > > > >> > version > > > >> > > > seems inconvenient. One way to improve this is for each > feature > > to > > > >> > define > > > >> > > > one version as the default. Then, when we upgrade a cluster, > we > > > will > > > >> > > > automatically upgrade the feature to the default version. An > > admin > > > >> > could > > > >> > > > use the tool to upgrade to a version higher than the default. > > > >> > > > > > > >> > > > 20. "The quorum controller can assist with this process by > > > >> generating a > > > >> > > > metadata snapshot after a metadata.version increase has been > > > >> committed > > > >> > to > > > >> > > > the metadata log. This snapshot will be a convenient way to > let > > > >> broker > > > >> > > and > > > >> > > > controller components rebuild their entire in-memory state > > > >> following an > > > >> > > > upgrade." The new version of the software could read both the > > new > > > >> and > > > >> > the > > > >> > > > old version. Is generating a new snapshot during upgrade > needed? > > > >> > > > > > > >> > > > Jun > > > >> > > > > > > >> > > > > > > >> > > > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe < > cmcc...@apache.org > > > > > > >> > wrote: > > > >> > > > > > > >> > > > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote: > > > >> > > > > > Hi, David, > > > >> > > > > > > > > >> > > > > > One more comment. > > > >> > > > > > > > > >> > > > > > 16. The main reason why KIP-584 requires finalizing a > > feature > > > >> > > manually > > > >> > > > is > > > >> > > > > > that in the ZK world, the controller doesn't know all > > brokers > > > >> in a > > > >> > > > > cluster. > > > >> > > > > > A broker temporarily down is not registered in ZK. in the > > > KRaft > > > >> > > world, > > > >> > > > > the > > > >> > > > > > controller keeps track of all brokers, including those > that > > > are > > > >> > > > > temporarily > > > >> > > > > > down. This makes it possible for the controller to > > > automatically > > > >> > > > > finalize a > > > >> > > > > > feature---it's safe to do so when all brokers support that > > > >> feature. > > > >> > > > This > > > >> > > > > > will make the upgrade process much simpler since no manual > > > >> command > > > >> > is > > > >> > > > > > required to turn on a new feature. Have we considered > this? > > > >> > > > > > > > > >> > > > > > Thanks, > > > >> > > > > > > > > >> > > > > > Jun > > > >> > > > > > > > >> > > > > Hi Jun, > > > >> > > > > > > > >> > > > > I guess David commented on this point already, but I'll > > comment > > > as > > > >> > > well. > > > >> > > > I > > > >> > > > > always had the perception that users viewed rolls as > > potentially > > > >> > risky > > > >> > > > and > > > >> > > > > were looking for ways to reduce the risk. Not enabling > > features > > > >> right > > > >> > > > away > > > >> > > > > after installing new software seems like one way to do that. > > If > > > we > > > >> > had > > > >> > > a > > > >> > > > > feature to automatically upgrade during a roll, I'm not sure > > > that > > > >> I > > > >> > > would > > > >> > > > > recommend that people use it, because if something fails, it > > > >> makes it > > > >> > > > > harder to tell if the new feature is at fault, or something > > else > > > >> in > > > >> > the > > > >> > > > new > > > >> > > > > software. > > > >> > > > > > > > >> > > > > We already tell users to do a "double roll" when going to a > > new > > > >> IBP. > > > >> > > > (Just > > > >> > > > > to give background to people who haven't heard that phrase, > > the > > > >> first > > > >> > > > roll > > > >> > > > > installs the new software, and the second roll updates the > > IBP). > > > >> So > > > >> > > this > > > >> > > > > KIP-778 mechanism is basically very similar to that, except > > the > > > >> > second > > > >> > > > > thing isn't a roll, but just an upgrade command. So I think > > this > > > >> is > > > >> > > > > consistent with what we currently do. > > > >> > > > > > > > >> > > > > Also, just like David said, we can always add auto-upgrade > > later > > > >> if > > > >> > > there > > > >> > > > > is demand... > > > >> > > > > > > > >> > > > > best, > > > >> > > > > Colin > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > > >> > > > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <j...@confluent.io> > > > >> wrote: > > > >> > > > > > > > > >> > > > > >> Hi, David, > > > >> > > > > >> > > > >> > > > > >> Thanks for the KIP. A few comments below. > > > >> > > > > >> > > > >> > > > > >> 10. It would be useful to describe how the controller > node > > > >> > > determines > > > >> > > > > the > > > >> > > > > >> RPC version used to communicate to other controller > nodes. > > > >> There > > > >> > > seems > > > >> > > > > to > > > >> > > > > >> be a bootstrap problem. A controller node can't read the > > log > > > >> and > > > >> > > > > >> therefore the feature level until a quorum leader is > > elected. > > > >> But > > > >> > > > leader > > > >> > > > > >> election requires an RPC. > > > >> > > > > >> > > > >> > > > > >> 11. For downgrades, it would be useful to describe how to > > > >> > determine > > > >> > > > the > > > >> > > > > >> downgrade process (generating new snapshot, propagating > the > > > >> > > snapshot, > > > >> > > > > etc) > > > >> > > > > >> has completed. We could block the UpdateFeature request > > until > > > >> the > > > >> > > > > process > > > >> > > > > >> is completed. However, since the process could take time, > > the > > > >> > > request > > > >> > > > > could > > > >> > > > > >> time out. Another way is through DescribeFeature and the > > > server > > > >> > only > > > >> > > > > >> reports downgraded versions after the process is > completed. > > > >> > > > > >> > > > >> > > > > >> 12. Since we are changing UpdateFeaturesRequest, do we > need > > > to > > > >> > > change > > > >> > > > > the > > > >> > > > > >> AdminClient api for updateFeatures too? > > > >> > > > > >> > > > >> > > > > >> 13. For the paragraph starting with "In the absence of an > > > >> operator > > > >> > > > > >> defined value for metadata.version", in KIP-584, we > > described > > > >> how > > > >> > to > > > >> > > > > >> finalize features with New cluster bootstrap. In that > case, > > > >> it's > > > >> > > > > >> inconvenient for the users to have to run an admin tool > to > > > >> > finalize > > > >> > > > the > > > >> > > > > >> version for each feature. Instead, the system detects > that > > > the > > > >> > > > /features > > > >> > > > > >> path is missing in ZK and thus automatically finalizes > > every > > > >> > feature > > > >> > > > > with > > > >> > > > > >> the latest supported version. Could we do something > similar > > > in > > > >> the > > > >> > > > KRaft > > > >> > > > > >> mode? > > > >> > > > > >> > > > >> > > > > >> 14. After the quorum leader generates a new snapshot, how > > do > > > we > > > >> > > force > > > >> > > > > >> other nodes to pick up the new snapshot? > > > >> > > > > >> > > > >> > > > > >> 15. I agree with Jose that it will be useful to describe > > when > > > >> > > > > generating a > > > >> > > > > >> new snapshot is needed. To me, it seems the new snapshot > is > > > >> only > > > >> > > > needed > > > >> > > > > >> when incompatible changes are made. > > > >> > > > > >> > > > >> > > > > >> 7. Jose, what control records were you referring? > > > >> > > > > >> > > > >> > > > > >> Thanks, > > > >> > > > > >> > > > >> > > > > >> Jun > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur < > > > >> > davidart...@apache.org > > > >> > > > > > > >> > > > > >> wrote: > > > >> > > > > >> > > > >> > > > > >>> Jose, thanks for the thorough review and comments! > > > >> > > > > >>> > > > >> > > > > >>> I am out of the office until next week, so I probably > > won't > > > be > > > >> > able > > > >> > > > to > > > >> > > > > >>> update the KIP until then. Here are some replies to your > > > >> > questions: > > > >> > > > > >>> > > > >> > > > > >>> 1. Generate snapshot on upgrade > > > >> > > > > >>> > > Metadata snapshot is generated and sent to the other > > > nodes > > > >> > > > > >>> > Why does the Active Controller need to generate a new > > > >> snapshot > > > >> > > and > > > >> > > > > >>> > force a snapshot fetch from the replicas (inactive > > > >> controller > > > >> > and > > > >> > > > > >>> > brokers) on an upgrade? Isn't writing the > > > FeatureLevelRecord > > > >> > good > > > >> > > > > >>> > enough to communicate the upgrade to the replicas? > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> You're right, we don't necessarily need to _transmit_ a > > > >> snapshot, > > > >> > > > since > > > >> > > > > >>> each node can generate its own equivalent snapshot > > > >> > > > > >>> > > > >> > > > > >>> 2. Generate snapshot on downgrade > > > >> > > > > >>> > > Metadata snapshot is generated and sent to the other > > > >> inactive > > > >> > > > > >>> > controllers and to brokers (this snapshot may be > lossy!) > > > >> > > > > >>> > Why do we need to send this downgraded snapshot to the > > > >> brokers? > > > >> > > The > > > >> > > > > >>> > replicas have seen the FeatureLevelRecord and noticed > > the > > > >> > > > downgrade. > > > >> > > > > >>> > Can we have the replicas each independently generate a > > > >> > downgraded > > > >> > > > > >>> > snapshot at the offset for the downgraded > > > >> FeatureLevelRecord? I > > > >> > > > > assume > > > >> > > > > >>> > that the active controller will guarantee that all > > records > > > >> > after > > > >> > > > the > > > >> > > > > >>> > FatureLevelRecord use the downgraded version. If so, > it > > > >> would > > > >> > be > > > >> > > > good > > > >> > > > > >>> > to mention that explicitly. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Similar to above, yes a broker that detects a downgrade > > via > > > >> > > > > >>> FeatureLevelRecord could generate its own downgrade > > snapshot > > > >> and > > > >> > > > reload > > > >> > > > > >>> its > > > >> > > > > >>> state from that. This does get a little fuzzy when we > > > consider > > > >> > > cases > > > >> > > > > where > > > >> > > > > >>> brokers are on different software versions and could be > > > >> > generating > > > >> > > a > > > >> > > > > >>> downgrade snapshot for version X, but using different > > > >> versions of > > > >> > > the > > > >> > > > > >>> code. > > > >> > > > > >>> It might be safer to let the controller generate the > > > snapshot > > > >> so > > > >> > > each > > > >> > > > > >>> broker (regardless of software version) gets the same > > > records. > > > >> > > > However, > > > >> > > > > >>> for > > > >> > > > > >>> upgrades (or downgrades) we expect the whole cluster to > be > > > >> > running > > > >> > > > the > > > >> > > > > >>> same > > > >> > > > > >>> software version before triggering the metadata.version > > > >> change, > > > >> > so > > > >> > > > > perhaps > > > >> > > > > >>> this isn't a likely scenario. Thoughts? > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> 3. Max metadata version > > > >> > > > > >>> > >For the first release that supports metadata.version, > > we > > > >> can > > > >> > > > simply > > > >> > > > > >>> > initialize metadata.version with the current (and > only) > > > >> > version. > > > >> > > > For > > > >> > > > > >>> future > > > >> > > > > >>> > releases, we will need a mechanism to bootstrap a > > > particular > > > >> > > > version. > > > >> > > > > >>> This > > > >> > > > > >>> > could be done using the meta.properties file or some > > > similar > > > >> > > > > mechanism. > > > >> > > > > >>> The > > > >> > > > > >>> > reason we need the allow for a specific initial > version > > is > > > >> to > > > >> > > > support > > > >> > > > > >>> the > > > >> > > > > >>> > use case of starting a Kafka cluster at version X with > > an > > > >> older > > > >> > > > > >>> > metadata.version. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> I assume that the Active Controller will learn the > > metadata > > > >> > version > > > >> > > > of > > > >> > > > > >>> > the broker through the BrokerRegistrationRequest. How > > will > > > >> the > > > >> > > > Active > > > >> > > > > >>> > Controller learn about the max metadata version of the > > > >> inactive > > > >> > > > > >>> > controller nodes? We currently don't send a > registration > > > >> > request > > > >> > > > from > > > >> > > > > >>> > the inactive controller to the active controller. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> This came up during the design, but I neglected to add > it > > to > > > >> the > > > >> > > KIP. > > > >> > > > > We > > > >> > > > > >>> will need a mechanism for determining the supported > > features > > > >> of > > > >> > > each > > > >> > > > > >>> controller similar to how brokers use > > > >> BrokerRegistrationRequest. > > > >> > > > > Perhaps > > > >> > > > > >>> controllers could write a FeatureLevelRecord (or > similar) > > to > > > >> the > > > >> > > > > metadata > > > >> > > > > >>> log indicating their supported version. WDYT? > > > >> > > > > >>> > > > >> > > > > >>> Why do you need to bootstrap a particular version? Isn't > > the > > > >> > intent > > > >> > > > > >>> > that the broker will learn the active metadata version > > by > > > >> > reading > > > >> > > > the > > > >> > > > > >>> > metadata before unfencing? > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> This bootstrapping is needed for when a KRaft cluster is > > > first > > > >> > > > > started. If > > > >> > > > > >>> we don't have this mechanism, the cluster can't really > do > > > >> > anything > > > >> > > > > until > > > >> > > > > >>> the operator finalizes the metadata.version with the > tool. > > > The > > > >> > > > > >>> bootstrapping will be done by the controller and the > > brokers > > > >> will > > > >> > > see > > > >> > > > > this > > > >> > > > > >>> version as a record (like you say). I'll add some text > to > > > >> clarify > > > >> > > > this. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> 4. Reject Registration - This is related to the bullet > > point > > > >> > above. > > > >> > > > > >>> > What will be the behavior of the active controller if > > the > > > >> > broker > > > >> > > > > sends > > > >> > > > > >>> > a metadata version that is not compatible with the > > cluster > > > >> wide > > > >> > > > > >>> > metadata version? > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> If a broker starts up with a lower supported version > range > > > >> than > > > >> > the > > > >> > > > > >>> current > > > >> > > > > >>> cluster metadata.version, it should log an error and > > > shutdown. > > > >> > This > > > >> > > > is > > > >> > > > > in > > > >> > > > > >>> line with KIP-584. > > > >> > > > > >>> > > > >> > > > > >>> 5. Discover upgrade > > > >> > > > > >>> > > This snapshot will be a convenient way to let broker > > and > > > >> > > > controller > > > >> > > > > >>> > components rebuild their entire in-memory state > > following > > > an > > > >> > > > upgrade. > > > >> > > > > >>> > Can we rely on the presence of the FeatureLevelRecord > > for > > > >> the > > > >> > > > > metadata > > > >> > > > > >>> > version for this functionality? If so, it avoids > having > > to > > > >> > reload > > > >> > > > the > > > >> > > > > >>> > snapshot. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> For upgrades, yes probably since we won't need to > > "rewrite" > > > >> any > > > >> > > > > records in > > > >> > > > > >>> this case. For downgrades, we will need to generate the > > > >> snapshot > > > >> > > and > > > >> > > > > >>> reload > > > >> > > > > >>> everything. > > > >> > > > > >>> > > > >> > > > > >>> 6. Metadata version specification > > > >> > > > > >>> > > V4(version=4, isBackwardsCompatible=false, > > > >> description="New > > > >> > > > > metadata > > > >> > > > > >>> > record type Bar"), > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Very cool. Do you have plans to generate Apache Kafka > HTML > > > >> > > > > >>> > documentation for this information? Would be helpful > to > > > >> display > > > >> > > > this > > > >> > > > > >>> > information to the user using the kafka-features.sh > and > > > >> feature > > > >> > > > RPC? > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Hm good idea :) I'll add a brief section on > documentation. > > > >> This > > > >> > > would > > > >> > > > > >>> certainly be very useful > > > >> > > > > >>> > > > >> > > > > >>> 7.Downgrade records > > > >> > > > > >>> > I think we should explicitly mention that the > downgrade > > > >> process > > > >> > > > will > > > >> > > > > >>> > downgrade both metadata records and controller > records. > > In > > > >> > > KIP-630 > > > >> > > > we > > > >> > > > > >>> > introduced two control records for snapshots. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Yes, good call. Let me re-read that KIP and include some > > > >> details. > > > >> > > > > >>> > > > >> > > > > >>> Thanks again for the comments! > > > >> > > > > >>> -David > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García > Sancio > > > >> > > > > >>> <jsan...@confluent.io.invalid> wrote: > > > >> > > > > >>> > > > >> > > > > >>> > One more comment. > > > >> > > > > >>> > > > > >> > > > > >>> > 7.Downgrade records > > > >> > > > > >>> > I think we should explicitly mention that the > downgrade > > > >> process > > > >> > > > will > > > >> > > > > >>> > downgrade both metadata records and controller > records. > > In > > > >> > > KIP-630 > > > >> > > > we > > > >> > > > > >>> > introduced two control records for snapshots. > > > >> > > > > >>> > > > > >> > > > > >>> > > > >> > > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > -- > > > >> > > -David > > > >> > > > > > >> > > > > >> > > > >> > > > >> -- > > > >> -- Guozhang > > > >> > > > > > > > > > > > > -- > > > > -David > > > > > > > > > > > > > -- > > > -David > > > > > > > > > -- > > -- Guozhang > > > > > -- > -David > -- -- Guozhang