Re: [DISCUSS] KIP-778 KRaft Upgrades

Colin McCabe Tue, 16 Nov 2021 13:17:00 -0800

On Tue, Nov 16, 2021, at 06:36, David Arthur wrote:
> An interesting case here is how to handle a version update if a majority of
> the quorum supports it, but the leader doesn't. For example, if C1 was the
> leader and an upgrade to version 4 was requested. Maybe this would trigger
> C1 to resign and inform the client to retry the update later.
>


Hmm, wouldn't we want to just reject the version update in this case? I don't 
see what the advantage of allowing it would be.

The reason for requiring a majority rather than all voters is mainly to cover 
the case where a voter is down, I thought. That clearly doesn't apply if the 
un-upgraded voter is the leader itself...

>
> We may eventually want to consider the metadata.version when electing a
> leader, but as long as we have the majority requirement before committing a
> new metadata.version, I think we should be safe.
>

Yeah, this is safe. If we elect a leader at metadata.version X then that means 
that a majority of the cluster is at least at version X. Proof by 
contradiction: assume that this is not the case. Then the newly elected leader 
must have a shorter __cluster_metadata log than a majority of the voters. But 
this is incompatible with winning a Raft election.

In the case where the leader is "behind" some of the other voters, those voters 
will truncate their logs to match the new leader. This will downgrade them. 
Basically this is the case where the feature upgrade was proposed, but never 
fully completed.

best,
Colin


> -David
>
> On Mon, Nov 15, 2021 at 12:52 PM Guozhang Wang <[email protected]> wrote:
>
>> Thanks David,
>>
>> 1. Got it. One thing I'm still not very clear is why it's sufficient to
>> select a metadata.version which is supported by majority of the quorum, but
>> not the whole quorum (i.e. choosing the lowest version among all the quorum
>> members)? Since the leader election today does not take this value into
>> consideration, we are not guaranteed that newly selected leaders would
>> always be able to recognize and support the initialized metadata.version
>> right?
>>
>> 2. Yeah I think I agree the behavior-but-not-RPC-change scenario is beyond
>> the scope of this KIP, we can defer it to later discussions.
>>
>> On Mon, Nov 15, 2021 at 8:13 AM David Arthur
>> <[email protected]> wrote:
>>
>> > Guozhang, thanks for the review!
>> >
>> > 1, As we've defined it so far, the initial metadata.version is set by an
>> > operator via the "kafka-storage.sh" tool. It would be possible for
>> > different values to be selected, but only the quorum leader would commit
>> a
>> > FeatureLevelRecord with the version they read locally. See the above
>> reply
>> > to Jun's question for a little more detail.
>> >
>> > We need to enable the KRaft RPCs regardless of metadata.version (vote,
>> > heartbeat, fetch, etc) so that the quorum can be formed, a leader can be
>> > elected, and followers can learn about the selected metadata.version. I
>> > believe the quorum startup procedure would go something like:
>> >
>> > * Controller nodes start, read their logs, begin leader election
>> > * While the elected node is becoming leader (in
>> > QuorumMetaLogListener#handleLeaderChange), initialize metadata.version if
>> > necessary
>> > * Followers replicate the FeatureLevelRecord
>> >
>> > This (probably) means the quorum peers must continue to rely on
>> ApiVersions
>> > to negotiate compatible RPC versions and these versions cannot depend on
>> > metadata.version.
>> >
>> > Does this make sense?
>> >
>> >
>> > 2, ApiVersionResponse includes the broker's supported feature flags as
>> well
>> > as the cluster-wide finalized feature flags. We probably need to add
>> > something like the feature flag "epoch" to this response payload in order
>> > to see which broker is most up to date.
>> >
>> > If the new feature flag version included RPC changes, we are helped by
>> the
>> > fact that a client won't attempt to use the new RPC until it has
>> discovered
>> > a broker that supports it via ApiVersions. The problem is more difficult
>> > for cases like you described where the feature flag upgrade changes the
>> > behavior of the broker, but not its RPCs. This is actually the same
>> problem
>> > as upgrading the IBP. During a rolling restart, clients may hit different
>> > brokers with different capabilities and not know it.
>> >
>> > This probably warrants further investigation, but hopefully you agree it
>> is
>> > beyond the scope of this KIP :)
>> >
>> > -David
>> >
>> >
>> > On Mon, Nov 15, 2021 at 10:26 AM David Arthur <[email protected]
>> >
>> > wrote:
>> >
>> > > Jun, thanks for the comments!
>> > >
>> > > 16, When a new cluster is deployed, we don't select the highest
>> available
>> > > metadata.version, but rather the quorum leader picks a bootstrap
>> version
>> > > defined in meta.properties. As mentioned earlier, we should add
>> > validation
>> > > here to ensure a majority of the followers could support this version
>> > > before initializing it. This would avoid a situation where a failover
>> > > results in a new leader who can't support the selected
>> metadata.version.
>> > >
>> > > Thinking a bit more on this, we do need to define a state where we're
>> > > running newer software, but we don't have the feature flag set. This
>> > could
>> > > happen if we were running an older IBP that did not support KIP-778.
>> > > Following on this, it doesn't seem too difficult to consider a case
>> where
>> > > the IBP has been upgraded, but we still have not finalized a
>> > > metadata.version. Here are some possible version combinations (assuming
>> > > KIP-778 is added to Kafka 3.2):
>> > >
>> > > Case  Software    IBP    metadata.version    effective version
>> > > --------------------------------------------------------------
>> > > A     3.1         3.1    -                   0  software too old for
>> > > feature flag
>> > > B     3.2         3.1    -                   0  feature flag supported,
>> > > but IBP too old
>> > > C     3.2         3.2    -                   0  feature flag supported,
>> > > but not initialized
>> > > D     3.2         3.2    1                   1  feature flag
>> initialized
>> > > to 1 (via operator or bootstrap process)
>> > > ...
>> > > E     3.8         3.1    -                   0  ...IBP too old
>> > > F     3.8         3.2    -                   0  ...not initialized
>> > > G     3.8         3.2    4                   4
>> > >
>> > >
>> > > Here I'm defining version 0 as "no metadata.version set". So back to
>> your
>> > > comment, I think the KIP omits discussion of case C from the above
>> table
>> > > which I can amend. Does that cover your concerns, or am I missing
>> > something
>> > > else?
>> > >
>> > >
>> > > > it's inconvenient for a user to manually upgrade every feature
>> version
>> > >
>> > > For this, we would probably want to extend the capabilities of
>> KIP-584. I
>> > > don't think anything we've discussed for KIP-778 will preclude us from
>> > > adding some kind of auto-upgrade in the future.
>> > >
>> > > 21, "disable" sounds good to me. I agree "delete feature-x" sounds a
>> bit
>> > > weird.
>> > >
>> > >
>> > >
>> > > On Mon, Nov 8, 2021 at 8:47 PM Guozhang Wang <[email protected]>
>> wrote:
>> > >
>> > >> Hello David,
>> > >>
>> > >> Thanks for the very nice writeup! It helped me a lot to refresh my
>> > memory
>> > >> on KIP-630/590/584 :)
>> > >>
>> > >> I just had two clarification questions after reading through the KIP:
>> > >>
>> > >> 1. For the initialization procedure, do we guarantee that all the
>> quorum
>> > >> nodes (inactive candidates and leaders, a.k.a. controllers) would
>> always
>> > >> initialize with the same metadata.version? If yes, how is that
>> > guaranteed?
>> > >> More specifically, when a quorum candidate is starting up, would it
>> > avoid
>> > >> handling any controller requests (including APIVersionsRequest) from
>> its
>> > >> peers in the quorum until it completes reading the local log? And even
>> > if
>> > >> yes, what would happen if there's no FeatureLevelRecord found, and
>> > >> different nodes read different values from their local meta.properties
>> > >> file
>> > >> or initializing from their binary's hard-code values?
>> > >>
>> > >> 2. This is not only for the metadata.version itself, but for general
>> > >> feature.versions: when a version is upgraded / downgraded, since
>> brokers
>> > >> would read the FeatureLevelRecord at their own pace, there will always
>> > be
>> > >> a
>> > >> race window when some brokers has processed the record and completed
>> the
>> > >> upgrade while others have not, hence may behave differently --- I'm
>> > >> thinking for the future like the specific replica selector to allow
>> > >> fetching from follower, and even more advanced selectors --- i.e.
>> should
>> > >> we
>> > >> consider letting clients to only talk to brokers with the highest
>> > metadata
>> > >> log offsets for example?
>> > >>
>> > >>
>> > >> Guozhang
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Nov 5, 2021 at 3:18 PM Jun Rao <[email protected]>
>> > wrote:
>> > >>
>> > >> > Hi, David,
>> > >> >
>> > >> > Thanks for the reply.
>> > >> >
>> > >> > 16. My first concern is that the KIP picks up meta.version
>> > >> inconsistently
>> > >> > during the deployment. If a new cluster is started, we pick up the
>> > >> highest
>> > >> > version. If we upgrade, we leave the feature version unchanged.
>> > >> > Intuitively, it seems that independent of how a cluster is deployed,
>> > we
>> > >> > should always pick the same feature version. I think we need to
>> think
>> > >> > this through in this KIP. My second concern is that as a particular
>> > >> version
>> > >> > matures, it's inconvenient for a user to manually upgrade every
>> > feature
>> > >> > version. As long as we have a path to achieve that in the future, we
>> > >> don't
>> > >> > need to address that in this KIP.
>> > >> >
>> > >> > 21. "./kafka-features.sh delete": Deleting a feature seems a bit
>> weird
>> > >> > since the logic is always there. Would it be better to use disable?
>> > >> >
>> > >> > Jun
>> > >> >
>> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
>> > >> > <[email protected]> wrote:
>> > >> >
>> > >> > > Colin and Jun, thanks for the additional comments!
>> > >> > >
>> > >> > > Colin:
>> > >> > >
>> > >> > > > We've been talking about having an automated RPC compatibility
>> > >> checker
>> > >> > >
>> > >> > > Do we have a way to mark fields in schemas as deprecated? It can
>> > stay
>> > >> in
>> > >> > > the RPC, it just complicates the logic a bit.
>> > >> > >
>> > >> > > > It would be nice if the active controller could validate that a
>> > >> > majority
>> > >> > > of the quorum could use the proposed metadata.version. The active
>> > >> > > controller should have this information, right? If we don't have
>> > >> recent
>> > >> > > information  from a quorum of voters, we wouldn't be active.
>> > >> > >
>> > >> > > I believe we should have this information from the
>> > >> ApiVersionsResponse.
>> > >> > It
>> > >> > > would be good to do this validation to avoid a situation where a
>> > >> > > quorum leader can't be elected due to unprocessable records.
>> > >> > >
>> > >> > > > Do we need delete as a command separate from downgrade?
>> > >> > >
>> > >> > > I think from an operator's perspective, it is nice to distinguish
>> > >> between
>> > >> > > changing a feature flag and unsetting it. It might be surprising
>> to
>> > an
>> > >> > > operator to see the flag's version set to nothing when they
>> > requested
>> > >> the
>> > >> > > downgrade to version 0 (or less).
>> > >> > >
>> > >> > > > it seems like we should spell out that metadata.version begins
>> at
>> > 1
>> > >> in
>> > >> > > KRaft clusters
>> > >> > >
>> > >> > > I added this text:
>> > >> > >
>> > >> > > Introduce an IBP version to indicate the lowest software version
>> > that
>> > >> > > > supports *metadata.version*. Below this IBP, the
>> > *metadata.version*
>> > >> is
>> > >> > > > undefined and will not be examined. At or above this IBP, the
>> > >> > > > *metadata.version* must be *0* for ZooKeeper clusters and will
>> be
>> > >> > > > initialized as *1* for KRaft clusters.
>> > >> > >
>> > >> > >
>> > >> > > > We probably also want an RPC implemented by both brokers and
>> > >> > controllers
>> > >> > > that will reveal the min and max supported versions for each
>> feature
>> > >> > level
>> > >> > > supported by the server
>> > >> > >
>> > >> > > This is available in ApiVersionsResponse (we include the server's
>> > >> > supported
>> > >> > > features as well as the cluster's finalized features)
>> > >> > >
>> > >> > > --------
>> > >> > >
>> > >> > > Jun:
>> > >> > >
>> > >> > > 12. I've updated the KIP with AdminClient changes
>> > >> > >
>> > >> > > 14. You're right, it looks like I missed a few sections regarding
>> > >> > snapshot
>> > >> > > generation. I've corrected it
>> > >> > >
>> > >> > > 16. This feels more like an enhancement to KIP-584. I agree it
>> could
>> > >> be
>> > >> > > useful, but perhaps we could address it separately from KRaft
>> > >> upgrades?
>> > >> > >
>> > >> > > 20. Indeed snapshots are not strictly necessary during an upgrade,
>> > >> I've
>> > >> > > reworded this
>> > >> > >
>> > >> > >
>> > >> > > Thanks!
>> > >> > > David
>> > >> > >
>> > >> > >
>> > >> > > On Thu, Nov 4, 2021 at 6:51 PM Jun Rao <[email protected]>
>> > >> wrote:
>> > >> > >
>> > >> > > > Hi, David, Jose and Colin,
>> > >> > > >
>> > >> > > > Thanks for the reply. A few more comments.
>> > >> > > >
>> > >> > > > 12. It seems that we haven't updated the AdminClient
>> accordingly?
>> > >> > > >
>> > >> > > > 14. "Metadata snapshot is generated and sent to the other
>> inactive
>> > >> > > > controllers and to brokers". I thought we wanted each broker to
>> > >> > generate
>> > >> > > > its own snapshot independently? If only the controller generates
>> > the
>> > >> > > > snapshot, how do we force other brokers to pick it up?
>> > >> > > >
>> > >> > > > 16. If a feature version is new, one may not want to enable it
>> > >> > > immediately
>> > >> > > > after the cluster is upgraded. However, if a feature version has
>> > >> been
>> > >> > > > stable, requiring every user to run a command to upgrade to that
>> > >> > version
>> > >> > > > seems inconvenient. One way to improve this is for each feature
>> to
>> > >> > define
>> > >> > > > one version as the default. Then, when we upgrade a cluster, we
>> > will
>> > >> > > > automatically upgrade the feature to the default version. An
>> admin
>> > >> > could
>> > >> > > > use the tool to upgrade to a version higher than the default.
>> > >> > > >
>> > >> > > > 20. "The quorum controller can assist with this process by
>> > >> generating a
>> > >> > > > metadata snapshot after a metadata.version increase has been
>> > >> committed
>> > >> > to
>> > >> > > > the metadata log. This snapshot will be a convenient way to let
>> > >> broker
>> > >> > > and
>> > >> > > > controller components rebuild their entire in-memory state
>> > >> following an
>> > >> > > > upgrade." The new version of the software could read both the
>> new
>> > >> and
>> > >> > the
>> > >> > > > old version. Is generating a new snapshot during upgrade needed?
>> > >> > > >
>> > >> > > > Jun
>> > >> > > >
>> > >> > > >
>> > >> > > > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <[email protected]
>> >
>> > >> > wrote:
>> > >> > > >
>> > >> > > > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
>> > >> > > > > > Hi, David,
>> > >> > > > > >
>> > >> > > > > > One more comment.
>> > >> > > > > >
>> > >> > > > > > 16. The main reason why KIP-584 requires finalizing a
>> feature
>> > >> > > manually
>> > >> > > > is
>> > >> > > > > > that in the ZK world, the controller doesn't know all
>> brokers
>> > >> in a
>> > >> > > > > cluster.
>> > >> > > > > > A broker temporarily down is not registered in ZK. in the
>> > KRaft
>> > >> > > world,
>> > >> > > > > the
>> > >> > > > > > controller keeps track of all brokers, including those that
>> > are
>> > >> > > > > temporarily
>> > >> > > > > > down. This makes it possible for the controller to
>> > automatically
>> > >> > > > > finalize a
>> > >> > > > > > feature---it's safe to do so when all brokers support that
>> > >> feature.
>> > >> > > > This
>> > >> > > > > > will make the upgrade process much simpler since no manual
>> > >> command
>> > >> > is
>> > >> > > > > > required to turn on a new feature. Have we considered this?
>> > >> > > > > >
>> > >> > > > > > Thanks,
>> > >> > > > > >
>> > >> > > > > > Jun
>> > >> > > > >
>> > >> > > > > Hi Jun,
>> > >> > > > >
>> > >> > > > > I guess David commented on this point already, but I'll
>> comment
>> > as
>> > >> > > well.
>> > >> > > > I
>> > >> > > > > always had the perception that users viewed rolls as
>> potentially
>> > >> > risky
>> > >> > > > and
>> > >> > > > > were looking for ways to reduce the risk. Not enabling
>> features
>> > >> right
>> > >> > > > away
>> > >> > > > > after installing new software seems like one way to do that.
>> If
>> > we
>> > >> > had
>> > >> > > a
>> > >> > > > > feature to automatically upgrade during a roll, I'm not sure
>> > that
>> > >> I
>> > >> > > would
>> > >> > > > > recommend that people use it, because if something fails, it
>> > >> makes it
>> > >> > > > > harder to tell if the new feature is at fault, or something
>> else
>> > >> in
>> > >> > the
>> > >> > > > new
>> > >> > > > > software.
>> > >> > > > >
>> > >> > > > > We already tell users to do a "double roll" when going to a
>> new
>> > >> IBP.
>> > >> > > > (Just
>> > >> > > > > to give background to people who haven't heard that phrase,
>> the
>> > >> first
>> > >> > > > roll
>> > >> > > > > installs the new software, and the second roll updates the
>> IBP).
>> > >> So
>> > >> > > this
>> > >> > > > > KIP-778 mechanism is basically very similar to that, except
>> the
>> > >> > second
>> > >> > > > > thing isn't a roll, but just an upgrade command. So I think
>> this
>> > >> is
>> > >> > > > > consistent with what we currently do.
>> > >> > > > >
>> > >> > > > > Also, just like David said, we can always add auto-upgrade
>> later
>> > >> if
>> > >> > > there
>> > >> > > > > is demand...
>> > >> > > > >
>> > >> > > > > best,
>> > >> > > > > Colin
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > >
>> > >> > > > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <[email protected]>
>> > >> wrote:
>> > >> > > > > >
>> > >> > > > > >> Hi, David,
>> > >> > > > > >>
>> > >> > > > > >> Thanks for the KIP. A few comments below.
>> > >> > > > > >>
>> > >> > > > > >> 10. It would be useful to describe how the controller node
>> > >> > > determines
>> > >> > > > > the
>> > >> > > > > >> RPC version used to communicate to other controller nodes.
>> > >> There
>> > >> > > seems
>> > >> > > > > to
>> > >> > > > > >> be a bootstrap problem. A controller node can't read the
>> log
>> > >> and
>> > >> > > > > >> therefore the feature level until a quorum leader is
>> elected.
>> > >> But
>> > >> > > > leader
>> > >> > > > > >> election requires an RPC.
>> > >> > > > > >>
>> > >> > > > > >> 11. For downgrades, it would be useful to describe how to
>> > >> > determine
>> > >> > > > the
>> > >> > > > > >> downgrade process (generating new snapshot, propagating the
>> > >> > > snapshot,
>> > >> > > > > etc)
>> > >> > > > > >> has completed. We could block the UpdateFeature request
>> until
>> > >> the
>> > >> > > > > process
>> > >> > > > > >> is completed. However, since the process could take time,
>> the
>> > >> > > request
>> > >> > > > > could
>> > >> > > > > >> time out. Another way is through DescribeFeature and the
>> > server
>> > >> > only
>> > >> > > > > >> reports downgraded versions after the process is completed.
>> > >> > > > > >>
>> > >> > > > > >> 12. Since we are changing UpdateFeaturesRequest, do we need
>> > to
>> > >> > > change
>> > >> > > > > the
>> > >> > > > > >> AdminClient api for updateFeatures too?
>> > >> > > > > >>
>> > >> > > > > >> 13. For the paragraph starting with "In the absence of an
>> > >> operator
>> > >> > > > > >> defined value for metadata.version", in KIP-584, we
>> described
>> > >> how
>> > >> > to
>> > >> > > > > >> finalize features with New cluster bootstrap. In that case,
>> > >> it's
>> > >> > > > > >> inconvenient for the users to have to run an admin tool to
>> > >> > finalize
>> > >> > > > the
>> > >> > > > > >> version for each feature. Instead, the system detects that
>> > the
>> > >> > > > /features
>> > >> > > > > >> path is missing in ZK and thus automatically finalizes
>> every
>> > >> > feature
>> > >> > > > > with
>> > >> > > > > >> the latest supported version. Could we do something similar
>> > in
>> > >> the
>> > >> > > > KRaft
>> > >> > > > > >> mode?
>> > >> > > > > >>
>> > >> > > > > >> 14. After the quorum leader generates a new snapshot, how
>> do
>> > we
>> > >> > > force
>> > >> > > > > >> other nodes to pick up the new snapshot?
>> > >> > > > > >>
>> > >> > > > > >> 15. I agree with Jose that it will be useful to describe
>> when
>> > >> > > > > generating a
>> > >> > > > > >> new snapshot is needed. To me, it seems the new snapshot is
>> > >> only
>> > >> > > > needed
>> > >> > > > > >> when incompatible changes are made.
>> > >> > > > > >>
>> > >> > > > > >> 7. Jose, what control records were you referring?
>> > >> > > > > >>
>> > >> > > > > >> Thanks,
>> > >> > > > > >>
>> > >> > > > > >> Jun
>> > >> > > > > >>
>> > >> > > > > >>
>> > >> > > > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <
>> > >> > [email protected]
>> > >> > > >
>> > >> > > > > >> wrote:
>> > >> > > > > >>
>> > >> > > > > >>> Jose, thanks for the thorough review and comments!
>> > >> > > > > >>>
>> > >> > > > > >>> I am out of the office until next week, so I probably
>> won't
>> > be
>> > >> > able
>> > >> > > > to
>> > >> > > > > >>> update the KIP until then. Here are some replies to your
>> > >> > questions:
>> > >> > > > > >>>
>> > >> > > > > >>> 1. Generate snapshot on upgrade
>> > >> > > > > >>> > > Metadata snapshot is generated and sent to the other
>> > nodes
>> > >> > > > > >>> > Why does the Active Controller need to generate a new
>> > >> snapshot
>> > >> > > and
>> > >> > > > > >>> > force a snapshot fetch from the replicas (inactive
>> > >> controller
>> > >> > and
>> > >> > > > > >>> > brokers) on an upgrade? Isn't writing the
>> > FeatureLevelRecord
>> > >> > good
>> > >> > > > > >>> > enough to communicate the upgrade to the replicas?
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> You're right, we don't necessarily need to _transmit_ a
>> > >> snapshot,
>> > >> > > > since
>> > >> > > > > >>> each node can generate its own equivalent snapshot
>> > >> > > > > >>>
>> > >> > > > > >>> 2. Generate snapshot on downgrade
>> > >> > > > > >>> > > Metadata snapshot is generated and sent to the other
>> > >> inactive
>> > >> > > > > >>> > controllers and to brokers (this snapshot may be lossy!)
>> > >> > > > > >>> > Why do we need to send this downgraded snapshot to the
>> > >> brokers?
>> > >> > > The
>> > >> > > > > >>> > replicas have seen the FeatureLevelRecord and noticed
>> the
>> > >> > > > downgrade.
>> > >> > > > > >>> > Can we have the replicas each independently generate a
>> > >> > downgraded
>> > >> > > > > >>> > snapshot at the offset for the downgraded
>> > >> FeatureLevelRecord? I
>> > >> > > > > assume
>> > >> > > > > >>> > that the active controller will guarantee that all
>> records
>> > >> > after
>> > >> > > > the
>> > >> > > > > >>> > FatureLevelRecord use the downgraded version. If so, it
>> > >> would
>> > >> > be
>> > >> > > > good
>> > >> > > > > >>> > to mention that explicitly.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> Similar to above, yes a broker that detects a downgrade
>> via
>> > >> > > > > >>> FeatureLevelRecord could generate its own downgrade
>> snapshot
>> > >> and
>> > >> > > > reload
>> > >> > > > > >>> its
>> > >> > > > > >>> state from that. This does get a little fuzzy when we
>> > consider
>> > >> > > cases
>> > >> > > > > where
>> > >> > > > > >>> brokers are on different software versions and could be
>> > >> > generating
>> > >> > > a
>> > >> > > > > >>> downgrade snapshot for version X, but using different
>> > >> versions of
>> > >> > > the
>> > >> > > > > >>> code.
>> > >> > > > > >>> It might be safer to let the controller generate the
>> > snapshot
>> > >> so
>> > >> > > each
>> > >> > > > > >>> broker (regardless of software version) gets the same
>> > records.
>> > >> > > > However,
>> > >> > > > > >>> for
>> > >> > > > > >>> upgrades (or downgrades) we expect the whole cluster to be
>> > >> > running
>> > >> > > > the
>> > >> > > > > >>> same
>> > >> > > > > >>> software version before triggering the metadata.version
>> > >> change,
>> > >> > so
>> > >> > > > > perhaps
>> > >> > > > > >>> this isn't a likely scenario. Thoughts?
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> 3. Max metadata version
>> > >> > > > > >>> > >For the first release that supports metadata.version,
>> we
>> > >> can
>> > >> > > > simply
>> > >> > > > > >>> > initialize metadata.version with the current (and only)
>> > >> > version.
>> > >> > > > For
>> > >> > > > > >>> future
>> > >> > > > > >>> > releases, we will need a mechanism to bootstrap a
>> > particular
>> > >> > > > version.
>> > >> > > > > >>> This
>> > >> > > > > >>> > could be done using the meta.properties file or some
>> > similar
>> > >> > > > > mechanism.
>> > >> > > > > >>> The
>> > >> > > > > >>> > reason we need the allow for a specific initial version
>> is
>> > >> to
>> > >> > > > support
>> > >> > > > > >>> the
>> > >> > > > > >>> > use case of starting a Kafka cluster at version X with
>> an
>> > >> older
>> > >> > > > > >>> > metadata.version.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> I assume that the Active Controller will learn the
>> metadata
>> > >> > version
>> > >> > > > of
>> > >> > > > > >>> > the broker through the BrokerRegistrationRequest. How
>> will
>> > >> the
>> > >> > > > Active
>> > >> > > > > >>> > Controller learn about the max metadata version of the
>> > >> inactive
>> > >> > > > > >>> > controller nodes? We currently don't send a registration
>> > >> > request
>> > >> > > > from
>> > >> > > > > >>> > the inactive controller to the active controller.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> This came up during the design, but I neglected to add it
>> to
>> > >> the
>> > >> > > KIP.
>> > >> > > > > We
>> > >> > > > > >>> will need a mechanism for determining the supported
>> features
>> > >> of
>> > >> > > each
>> > >> > > > > >>> controller similar to how brokers use
>> > >> BrokerRegistrationRequest.
>> > >> > > > > Perhaps
>> > >> > > > > >>> controllers could write a FeatureLevelRecord (or similar)
>> to
>> > >> the
>> > >> > > > > metadata
>> > >> > > > > >>> log indicating their supported version. WDYT?
>> > >> > > > > >>>
>> > >> > > > > >>> Why do you need to bootstrap a particular version? Isn't
>> the
>> > >> > intent
>> > >> > > > > >>> > that the broker will learn the active metadata version
>> by
>> > >> > reading
>> > >> > > > the
>> > >> > > > > >>> > metadata before unfencing?
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> This bootstrapping is needed for when a KRaft cluster is
>> > first
>> > >> > > > > started. If
>> > >> > > > > >>> we don't have this mechanism, the cluster can't really do
>> > >> > anything
>> > >> > > > > until
>> > >> > > > > >>> the operator finalizes the metadata.version with the tool.
>> > The
>> > >> > > > > >>> bootstrapping will be done by the controller and the
>> brokers
>> > >> will
>> > >> > > see
>> > >> > > > > this
>> > >> > > > > >>> version as a record (like you say). I'll add some text to
>> > >> clarify
>> > >> > > > this.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> 4. Reject Registration - This is related to the bullet
>> point
>> > >> > above.
>> > >> > > > > >>> > What will be the behavior of the active controller if
>> the
>> > >> > broker
>> > >> > > > > sends
>> > >> > > > > >>> > a metadata version that is not compatible with the
>> cluster
>> > >> wide
>> > >> > > > > >>> > metadata version?
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> If a broker starts up with a lower supported version range
>> > >> than
>> > >> > the
>> > >> > > > > >>> current
>> > >> > > > > >>> cluster metadata.version, it should log an error and
>> > shutdown.
>> > >> > This
>> > >> > > > is
>> > >> > > > > in
>> > >> > > > > >>> line with KIP-584.
>> > >> > > > > >>>
>> > >> > > > > >>> 5. Discover upgrade
>> > >> > > > > >>> > > This snapshot will be a convenient way to let broker
>> and
>> > >> > > > controller
>> > >> > > > > >>> > components rebuild their entire in-memory state
>> following
>> > an
>> > >> > > > upgrade.
>> > >> > > > > >>> > Can we rely on the presence of the FeatureLevelRecord
>> for
>> > >> the
>> > >> > > > > metadata
>> > >> > > > > >>> > version for this functionality? If so, it avoids having
>> to
>> > >> > reload
>> > >> > > > the
>> > >> > > > > >>> > snapshot.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> For upgrades, yes probably since we won't need to
>> "rewrite"
>> > >> any
>> > >> > > > > records in
>> > >> > > > > >>> this case. For downgrades, we will need to generate the
>> > >> snapshot
>> > >> > > and
>> > >> > > > > >>> reload
>> > >> > > > > >>> everything.
>> > >> > > > > >>>
>> > >> > > > > >>> 6. Metadata version specification
>> > >> > > > > >>> > >  V4(version=4, isBackwardsCompatible=false,
>> > >> description="New
>> > >> > > > > metadata
>> > >> > > > > >>> > record type Bar"),
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> Very cool. Do you have plans to generate Apache Kafka HTML
>> > >> > > > > >>> > documentation for this information? Would be helpful to
>> > >> display
>> > >> > > > this
>> > >> > > > > >>> > information to the user using the kafka-features.sh and
>> > >> feature
>> > >> > > > RPC?
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> Hm good idea :) I'll add a brief section on documentation.
>> > >> This
>> > >> > > would
>> > >> > > > > >>> certainly be very useful
>> > >> > > > > >>>
>> > >> > > > > >>> 7.Downgrade records
>> > >> > > > > >>> > I think we should explicitly mention that the downgrade
>> > >> process
>> > >> > > > will
>> > >> > > > > >>> > downgrade both metadata records and controller records.
>> In
>> > >> > > KIP-630
>> > >> > > > we
>> > >> > > > > >>> > introduced two control records for snapshots.
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> Yes, good call. Let me re-read that KIP and include some
>> > >> details.
>> > >> > > > > >>>
>> > >> > > > > >>> Thanks again for the comments!
>> > >> > > > > >>> -David
>> > >> > > > > >>>
>> > >> > > > > >>>
>> > >> > > > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García Sancio
>> > >> > > > > >>> <[email protected]> wrote:
>> > >> > > > > >>>
>> > >> > > > > >>> > One more comment.
>> > >> > > > > >>> >
>> > >> > > > > >>> > 7.Downgrade records
>> > >> > > > > >>> > I think we should explicitly mention that the downgrade
>> > >> process
>> > >> > > > will
>> > >> > > > > >>> > downgrade both metadata records and controller records.
>> In
>> > >> > > KIP-630
>> > >> > > > we
>> > >> > > > > >>> > introduced two control records for snapshots.
>> > >> > > > > >>> >
>> > >> > > > > >>>
>> > >> > > > > >>
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> > >
>> > >> > > --
>> > >> > > -David
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >> --
>> > >> -- Guozhang
>> > >>
>> > >
>> > >
>> > > --
>> > > -David
>> > >
>> >
>> >
>> > --
>> > -David
>> >
>>
>>
>> --
>> -- Guozhang
>>
>
>
> -- 
> -David

Re: [DISCUSS] KIP-778 KRaft Upgrades

Reply via email to