Thanks, I think having the leader election to consider the metadata.version
would be a good idea moving forward, but we do not need to include in this
KIP.

On Tue, Nov 16, 2021 at 6:37 AM David Arthur
<david.art...@confluent.io.invalid> wrote:

> Guozhang,
>
> 1. By requiring a majority of all controller nodes to support the version
> selected by the leader, we increase the likelihood that the next leader
> will also support it. We can't guarantee that all nodes definitely support
> the selected metadata.version because there could always be an offline or
> partitioned peer that is running old software.
>
> If a controller running old software manages elected, we hit this case:
>
>  In the unlikely event that an active controller encounters an unsupported
> > metadata.version, it should resign and terminate.
>
>
> So, given this, we should be able to eventually elect a controller that
> does support the metadata.version.
>
>
> Consider controllers C1, C2, C3 with this arrangement:
>
> Node  SoftwareVer MaxMetadataVer
> C1    3.2         1
> C2    3.3         4
> C3    3.3         4
>
> If the current metadata.version is 1 and we're trying to upgrade to 4, we
> would allow it since two of the three nodes support it. If any one
> controller is down while we are attempting an upgrade, we would require
> that both of remaining alive nodes support the target metadata.version
> (since we require a majority of _all_ controller nodes, not just alive
> ones).
>
> An interesting case here is how to handle a version update if a majority of
> the quorum supports it, but the leader doesn't. For example, if C1 was the
> leader and an upgrade to version 4 was requested. Maybe this would trigger
> C1 to resign and inform the client to retry the update later.
>
> We may eventually want to consider the metadata.version when electing a
> leader, but as long as we have the majority requirement before committing a
> new metadata.version, I think we should be safe.
>
> -David
>
> On Mon, Nov 15, 2021 at 12:52 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Thanks David,
> >
> > 1. Got it. One thing I'm still not very clear is why it's sufficient to
> > select a metadata.version which is supported by majority of the quorum,
> but
> > not the whole quorum (i.e. choosing the lowest version among all the
> quorum
> > members)? Since the leader election today does not take this value into
> > consideration, we are not guaranteed that newly selected leaders would
> > always be able to recognize and support the initialized metadata.version
> > right?
> >
> > 2. Yeah I think I agree the behavior-but-not-RPC-change scenario is
> beyond
> > the scope of this KIP, we can defer it to later discussions.
> >
> > On Mon, Nov 15, 2021 at 8:13 AM David Arthur
> > <david.art...@confluent.io.invalid> wrote:
> >
> > > Guozhang, thanks for the review!
> > >
> > > 1, As we've defined it so far, the initial metadata.version is set by
> an
> > > operator via the "kafka-storage.sh" tool. It would be possible for
> > > different values to be selected, but only the quorum leader would
> commit
> > a
> > > FeatureLevelRecord with the version they read locally. See the above
> > reply
> > > to Jun's question for a little more detail.
> > >
> > > We need to enable the KRaft RPCs regardless of metadata.version (vote,
> > > heartbeat, fetch, etc) so that the quorum can be formed, a leader can
> be
> > > elected, and followers can learn about the selected metadata.version. I
> > > believe the quorum startup procedure would go something like:
> > >
> > > * Controller nodes start, read their logs, begin leader election
> > > * While the elected node is becoming leader (in
> > > QuorumMetaLogListener#handleLeaderChange), initialize metadata.version
> if
> > > necessary
> > > * Followers replicate the FeatureLevelRecord
> > >
> > > This (probably) means the quorum peers must continue to rely on
> > ApiVersions
> > > to negotiate compatible RPC versions and these versions cannot depend
> on
> > > metadata.version.
> > >
> > > Does this make sense?
> > >
> > >
> > > 2, ApiVersionResponse includes the broker's supported feature flags as
> > well
> > > as the cluster-wide finalized feature flags. We probably need to add
> > > something like the feature flag "epoch" to this response payload in
> order
> > > to see which broker is most up to date.
> > >
> > > If the new feature flag version included RPC changes, we are helped by
> > the
> > > fact that a client won't attempt to use the new RPC until it has
> > discovered
> > > a broker that supports it via ApiVersions. The problem is more
> difficult
> > > for cases like you described where the feature flag upgrade changes the
> > > behavior of the broker, but not its RPCs. This is actually the same
> > problem
> > > as upgrading the IBP. During a rolling restart, clients may hit
> different
> > > brokers with different capabilities and not know it.
> > >
> > > This probably warrants further investigation, but hopefully you agree
> it
> > is
> > > beyond the scope of this KIP :)
> > >
> > > -David
> > >
> > >
> > > On Mon, Nov 15, 2021 at 10:26 AM David Arthur <
> david.art...@confluent.io
> > >
> > > wrote:
> > >
> > > > Jun, thanks for the comments!
> > > >
> > > > 16, When a new cluster is deployed, we don't select the highest
> > available
> > > > metadata.version, but rather the quorum leader picks a bootstrap
> > version
> > > > defined in meta.properties. As mentioned earlier, we should add
> > > validation
> > > > here to ensure a majority of the followers could support this version
> > > > before initializing it. This would avoid a situation where a failover
> > > > results in a new leader who can't support the selected
> > metadata.version.
> > > >
> > > > Thinking a bit more on this, we do need to define a state where we're
> > > > running newer software, but we don't have the feature flag set. This
> > > could
> > > > happen if we were running an older IBP that did not support KIP-778.
> > > > Following on this, it doesn't seem too difficult to consider a case
> > where
> > > > the IBP has been upgraded, but we still have not finalized a
> > > > metadata.version. Here are some possible version combinations
> (assuming
> > > > KIP-778 is added to Kafka 3.2):
> > > >
> > > > Case  Software    IBP    metadata.version    effective version
> > > > --------------------------------------------------------------
> > > > A     3.1         3.1    -                   0  software too old for
> > > > feature flag
> > > > B     3.2         3.1    -                   0  feature flag
> supported,
> > > > but IBP too old
> > > > C     3.2         3.2    -                   0  feature flag
> supported,
> > > > but not initialized
> > > > D     3.2         3.2    1                   1  feature flag
> > initialized
> > > > to 1 (via operator or bootstrap process)
> > > > ...
> > > > E     3.8         3.1    -                   0  ...IBP too old
> > > > F     3.8         3.2    -                   0  ...not initialized
> > > > G     3.8         3.2    4                   4
> > > >
> > > >
> > > > Here I'm defining version 0 as "no metadata.version set". So back to
> > your
> > > > comment, I think the KIP omits discussion of case C from the above
> > table
> > > > which I can amend. Does that cover your concerns, or am I missing
> > > something
> > > > else?
> > > >
> > > >
> > > > > it's inconvenient for a user to manually upgrade every feature
> > version
> > > >
> > > > For this, we would probably want to extend the capabilities of
> > KIP-584. I
> > > > don't think anything we've discussed for KIP-778 will preclude us
> from
> > > > adding some kind of auto-upgrade in the future.
> > > >
> > > > 21, "disable" sounds good to me. I agree "delete feature-x" sounds a
> > bit
> > > > weird.
> > > >
> > > >
> > > >
> > > > On Mon, Nov 8, 2021 at 8:47 PM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > > >
> > > >> Hello David,
> > > >>
> > > >> Thanks for the very nice writeup! It helped me a lot to refresh my
> > > memory
> > > >> on KIP-630/590/584 :)
> > > >>
> > > >> I just had two clarification questions after reading through the
> KIP:
> > > >>
> > > >> 1. For the initialization procedure, do we guarantee that all the
> > quorum
> > > >> nodes (inactive candidates and leaders, a.k.a. controllers) would
> > always
> > > >> initialize with the same metadata.version? If yes, how is that
> > > guaranteed?
> > > >> More specifically, when a quorum candidate is starting up, would it
> > > avoid
> > > >> handling any controller requests (including APIVersionsRequest) from
> > its
> > > >> peers in the quorum until it completes reading the local log? And
> even
> > > if
> > > >> yes, what would happen if there's no FeatureLevelRecord found, and
> > > >> different nodes read different values from their local
> meta.properties
> > > >> file
> > > >> or initializing from their binary's hard-code values?
> > > >>
> > > >> 2. This is not only for the metadata.version itself, but for general
> > > >> feature.versions: when a version is upgraded / downgraded, since
> > brokers
> > > >> would read the FeatureLevelRecord at their own pace, there will
> always
> > > be
> > > >> a
> > > >> race window when some brokers has processed the record and completed
> > the
> > > >> upgrade while others have not, hence may behave differently --- I'm
> > > >> thinking for the future like the specific replica selector to allow
> > > >> fetching from follower, and even more advanced selectors --- i.e.
> > should
> > > >> we
> > > >> consider letting clients to only talk to brokers with the highest
> > > metadata
> > > >> log offsets for example?
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Nov 5, 2021 at 3:18 PM Jun Rao <j...@confluent.io.invalid>
> > > wrote:
> > > >>
> > > >> > Hi, David,
> > > >> >
> > > >> > Thanks for the reply.
> > > >> >
> > > >> > 16. My first concern is that the KIP picks up meta.version
> > > >> inconsistently
> > > >> > during the deployment. If a new cluster is started, we pick up the
> > > >> highest
> > > >> > version. If we upgrade, we leave the feature version unchanged.
> > > >> > Intuitively, it seems that independent of how a cluster is
> deployed,
> > > we
> > > >> > should always pick the same feature version. I think we need to
> > think
> > > >> > this through in this KIP. My second concern is that as a
> particular
> > > >> version
> > > >> > matures, it's inconvenient for a user to manually upgrade every
> > > feature
> > > >> > version. As long as we have a path to achieve that in the future,
> we
> > > >> don't
> > > >> > need to address that in this KIP.
> > > >> >
> > > >> > 21. "./kafka-features.sh delete": Deleting a feature seems a bit
> > weird
> > > >> > since the logic is always there. Would it be better to use
> disable?
> > > >> >
> > > >> > Jun
> > > >> >
> > > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > > >> > <david.art...@confluent.io.invalid> wrote:
> > > >> >
> > > >> > > Colin and Jun, thanks for the additional comments!
> > > >> > >
> > > >> > > Colin:
> > > >> > >
> > > >> > > > We've been talking about having an automated RPC compatibility
> > > >> checker
> > > >> > >
> > > >> > > Do we have a way to mark fields in schemas as deprecated? It can
> > > stay
> > > >> in
> > > >> > > the RPC, it just complicates the logic a bit.
> > > >> > >
> > > >> > > > It would be nice if the active controller could validate that
> a
> > > >> > majority
> > > >> > > of the quorum could use the proposed metadata.version. The
> active
> > > >> > > controller should have this information, right? If we don't have
> > > >> recent
> > > >> > > information  from a quorum of voters, we wouldn't be active.
> > > >> > >
> > > >> > > I believe we should have this information from the
> > > >> ApiVersionsResponse.
> > > >> > It
> > > >> > > would be good to do this validation to avoid a situation where a
> > > >> > > quorum leader can't be elected due to unprocessable records.
> > > >> > >
> > > >> > > > Do we need delete as a command separate from downgrade?
> > > >> > >
> > > >> > > I think from an operator's perspective, it is nice to
> distinguish
> > > >> between
> > > >> > > changing a feature flag and unsetting it. It might be surprising
> > to
> > > an
> > > >> > > operator to see the flag's version set to nothing when they
> > > requested
> > > >> the
> > > >> > > downgrade to version 0 (or less).
> > > >> > >
> > > >> > > > it seems like we should spell out that metadata.version begins
> > at
> > > 1
> > > >> in
> > > >> > > KRaft clusters
> > > >> > >
> > > >> > > I added this text:
> > > >> > >
> > > >> > > Introduce an IBP version to indicate the lowest software version
> > > that
> > > >> > > > supports *metadata.version*. Below this IBP, the
> > > *metadata.version*
> > > >> is
> > > >> > > > undefined and will not be examined. At or above this IBP, the
> > > >> > > > *metadata.version* must be *0* for ZooKeeper clusters and will
> > be
> > > >> > > > initialized as *1* for KRaft clusters.
> > > >> > >
> > > >> > >
> > > >> > > > We probably also want an RPC implemented by both brokers and
> > > >> > controllers
> > > >> > > that will reveal the min and max supported versions for each
> > feature
> > > >> > level
> > > >> > > supported by the server
> > > >> > >
> > > >> > > This is available in ApiVersionsResponse (we include the
> server's
> > > >> > supported
> > > >> > > features as well as the cluster's finalized features)
> > > >> > >
> > > >> > > --------
> > > >> > >
> > > >> > > Jun:
> > > >> > >
> > > >> > > 12. I've updated the KIP with AdminClient changes
> > > >> > >
> > > >> > > 14. You're right, it looks like I missed a few sections
> regarding
> > > >> > snapshot
> > > >> > > generation. I've corrected it
> > > >> > >
> > > >> > > 16. This feels more like an enhancement to KIP-584. I agree it
> > could
> > > >> be
> > > >> > > useful, but perhaps we could address it separately from KRaft
> > > >> upgrades?
> > > >> > >
> > > >> > > 20. Indeed snapshots are not strictly necessary during an
> upgrade,
> > > >> I've
> > > >> > > reworded this
> > > >> > >
> > > >> > >
> > > >> > > Thanks!
> > > >> > > David
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Nov 4, 2021 at 6:51 PM Jun Rao <j...@confluent.io.invalid
> >
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi, David, Jose and Colin,
> > > >> > > >
> > > >> > > > Thanks for the reply. A few more comments.
> > > >> > > >
> > > >> > > > 12. It seems that we haven't updated the AdminClient
> > accordingly?
> > > >> > > >
> > > >> > > > 14. "Metadata snapshot is generated and sent to the other
> > inactive
> > > >> > > > controllers and to brokers". I thought we wanted each broker
> to
> > > >> > generate
> > > >> > > > its own snapshot independently? If only the controller
> generates
> > > the
> > > >> > > > snapshot, how do we force other brokers to pick it up?
> > > >> > > >
> > > >> > > > 16. If a feature version is new, one may not want to enable it
> > > >> > > immediately
> > > >> > > > after the cluster is upgraded. However, if a feature version
> has
> > > >> been
> > > >> > > > stable, requiring every user to run a command to upgrade to
> that
> > > >> > version
> > > >> > > > seems inconvenient. One way to improve this is for each
> feature
> > to
> > > >> > define
> > > >> > > > one version as the default. Then, when we upgrade a cluster,
> we
> > > will
> > > >> > > > automatically upgrade the feature to the default version. An
> > admin
> > > >> > could
> > > >> > > > use the tool to upgrade to a version higher than the default.
> > > >> > > >
> > > >> > > > 20. "The quorum controller can assist with this process by
> > > >> generating a
> > > >> > > > metadata snapshot after a metadata.version increase has been
> > > >> committed
> > > >> > to
> > > >> > > > the metadata log. This snapshot will be a convenient way to
> let
> > > >> broker
> > > >> > > and
> > > >> > > > controller components rebuild their entire in-memory state
> > > >> following an
> > > >> > > > upgrade." The new version of the software could read both the
> > new
> > > >> and
> > > >> > the
> > > >> > > > old version. Is generating a new snapshot during upgrade
> needed?
> > > >> > > >
> > > >> > > > Jun
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <
> cmcc...@apache.org
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
> > > >> > > > > > Hi, David,
> > > >> > > > > >
> > > >> > > > > > One more comment.
> > > >> > > > > >
> > > >> > > > > > 16. The main reason why KIP-584 requires finalizing a
> > feature
> > > >> > > manually
> > > >> > > > is
> > > >> > > > > > that in the ZK world, the controller doesn't know all
> > brokers
> > > >> in a
> > > >> > > > > cluster.
> > > >> > > > > > A broker temporarily down is not registered in ZK. in the
> > > KRaft
> > > >> > > world,
> > > >> > > > > the
> > > >> > > > > > controller keeps track of all brokers, including those
> that
> > > are
> > > >> > > > > temporarily
> > > >> > > > > > down. This makes it possible for the controller to
> > > automatically
> > > >> > > > > finalize a
> > > >> > > > > > feature---it's safe to do so when all brokers support that
> > > >> feature.
> > > >> > > > This
> > > >> > > > > > will make the upgrade process much simpler since no manual
> > > >> command
> > > >> > is
> > > >> > > > > > required to turn on a new feature. Have we considered
> this?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > >
> > > >> > > > > > Jun
> > > >> > > > >
> > > >> > > > > Hi Jun,
> > > >> > > > >
> > > >> > > > > I guess David commented on this point already, but I'll
> > comment
> > > as
> > > >> > > well.
> > > >> > > > I
> > > >> > > > > always had the perception that users viewed rolls as
> > potentially
> > > >> > risky
> > > >> > > > and
> > > >> > > > > were looking for ways to reduce the risk. Not enabling
> > features
> > > >> right
> > > >> > > > away
> > > >> > > > > after installing new software seems like one way to do that.
> > If
> > > we
> > > >> > had
> > > >> > > a
> > > >> > > > > feature to automatically upgrade during a roll, I'm not sure
> > > that
> > > >> I
> > > >> > > would
> > > >> > > > > recommend that people use it, because if something fails, it
> > > >> makes it
> > > >> > > > > harder to tell if the new feature is at fault, or something
> > else
> > > >> in
> > > >> > the
> > > >> > > > new
> > > >> > > > > software.
> > > >> > > > >
> > > >> > > > > We already tell users to do a "double roll" when going to a
> > new
> > > >> IBP.
> > > >> > > > (Just
> > > >> > > > > to give background to people who haven't heard that phrase,
> > the
> > > >> first
> > > >> > > > roll
> > > >> > > > > installs the new software, and the second roll updates the
> > IBP).
> > > >> So
> > > >> > > this
> > > >> > > > > KIP-778 mechanism is basically very similar to that, except
> > the
> > > >> > second
> > > >> > > > > thing isn't a roll, but just an upgrade command. So I think
> > this
> > > >> is
> > > >> > > > > consistent with what we currently do.
> > > >> > > > >
> > > >> > > > > Also, just like David said, we can always add auto-upgrade
> > later
> > > >> if
> > > >> > > there
> > > >> > > > > is demand...
> > > >> > > > >
> > > >> > > > > best,
> > > >> > > > > Colin
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > >
> > > >> > > > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <j...@confluent.io>
> > > >> wrote:
> > > >> > > > > >
> > > >> > > > > >> Hi, David,
> > > >> > > > > >>
> > > >> > > > > >> Thanks for the KIP. A few comments below.
> > > >> > > > > >>
> > > >> > > > > >> 10. It would be useful to describe how the controller
> node
> > > >> > > determines
> > > >> > > > > the
> > > >> > > > > >> RPC version used to communicate to other controller
> nodes.
> > > >> There
> > > >> > > seems
> > > >> > > > > to
> > > >> > > > > >> be a bootstrap problem. A controller node can't read the
> > log
> > > >> and
> > > >> > > > > >> therefore the feature level until a quorum leader is
> > elected.
> > > >> But
> > > >> > > > leader
> > > >> > > > > >> election requires an RPC.
> > > >> > > > > >>
> > > >> > > > > >> 11. For downgrades, it would be useful to describe how to
> > > >> > determine
> > > >> > > > the
> > > >> > > > > >> downgrade process (generating new snapshot, propagating
> the
> > > >> > > snapshot,
> > > >> > > > > etc)
> > > >> > > > > >> has completed. We could block the UpdateFeature request
> > until
> > > >> the
> > > >> > > > > process
> > > >> > > > > >> is completed. However, since the process could take time,
> > the
> > > >> > > request
> > > >> > > > > could
> > > >> > > > > >> time out. Another way is through DescribeFeature and the
> > > server
> > > >> > only
> > > >> > > > > >> reports downgraded versions after the process is
> completed.
> > > >> > > > > >>
> > > >> > > > > >> 12. Since we are changing UpdateFeaturesRequest, do we
> need
> > > to
> > > >> > > change
> > > >> > > > > the
> > > >> > > > > >> AdminClient api for updateFeatures too?
> > > >> > > > > >>
> > > >> > > > > >> 13. For the paragraph starting with "In the absence of an
> > > >> operator
> > > >> > > > > >> defined value for metadata.version", in KIP-584, we
> > described
> > > >> how
> > > >> > to
> > > >> > > > > >> finalize features with New cluster bootstrap. In that
> case,
> > > >> it's
> > > >> > > > > >> inconvenient for the users to have to run an admin tool
> to
> > > >> > finalize
> > > >> > > > the
> > > >> > > > > >> version for each feature. Instead, the system detects
> that
> > > the
> > > >> > > > /features
> > > >> > > > > >> path is missing in ZK and thus automatically finalizes
> > every
> > > >> > feature
> > > >> > > > > with
> > > >> > > > > >> the latest supported version. Could we do something
> similar
> > > in
> > > >> the
> > > >> > > > KRaft
> > > >> > > > > >> mode?
> > > >> > > > > >>
> > > >> > > > > >> 14. After the quorum leader generates a new snapshot, how
> > do
> > > we
> > > >> > > force
> > > >> > > > > >> other nodes to pick up the new snapshot?
> > > >> > > > > >>
> > > >> > > > > >> 15. I agree with Jose that it will be useful to describe
> > when
> > > >> > > > > generating a
> > > >> > > > > >> new snapshot is needed. To me, it seems the new snapshot
> is
> > > >> only
> > > >> > > > needed
> > > >> > > > > >> when incompatible changes are made.
> > > >> > > > > >>
> > > >> > > > > >> 7. Jose, what control records were you referring?
> > > >> > > > > >>
> > > >> > > > > >> Thanks,
> > > >> > > > > >>
> > > >> > > > > >> Jun
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <
> > > >> > davidart...@apache.org
> > > >> > > >
> > > >> > > > > >> wrote:
> > > >> > > > > >>
> > > >> > > > > >>> Jose, thanks for the thorough review and comments!
> > > >> > > > > >>>
> > > >> > > > > >>> I am out of the office until next week, so I probably
> > won't
> > > be
> > > >> > able
> > > >> > > > to
> > > >> > > > > >>> update the KIP until then. Here are some replies to your
> > > >> > questions:
> > > >> > > > > >>>
> > > >> > > > > >>> 1. Generate snapshot on upgrade
> > > >> > > > > >>> > > Metadata snapshot is generated and sent to the other
> > > nodes
> > > >> > > > > >>> > Why does the Active Controller need to generate a new
> > > >> snapshot
> > > >> > > and
> > > >> > > > > >>> > force a snapshot fetch from the replicas (inactive
> > > >> controller
> > > >> > and
> > > >> > > > > >>> > brokers) on an upgrade? Isn't writing the
> > > FeatureLevelRecord
> > > >> > good
> > > >> > > > > >>> > enough to communicate the upgrade to the replicas?
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> You're right, we don't necessarily need to _transmit_ a
> > > >> snapshot,
> > > >> > > > since
> > > >> > > > > >>> each node can generate its own equivalent snapshot
> > > >> > > > > >>>
> > > >> > > > > >>> 2. Generate snapshot on downgrade
> > > >> > > > > >>> > > Metadata snapshot is generated and sent to the other
> > > >> inactive
> > > >> > > > > >>> > controllers and to brokers (this snapshot may be
> lossy!)
> > > >> > > > > >>> > Why do we need to send this downgraded snapshot to the
> > > >> brokers?
> > > >> > > The
> > > >> > > > > >>> > replicas have seen the FeatureLevelRecord and noticed
> > the
> > > >> > > > downgrade.
> > > >> > > > > >>> > Can we have the replicas each independently generate a
> > > >> > downgraded
> > > >> > > > > >>> > snapshot at the offset for the downgraded
> > > >> FeatureLevelRecord? I
> > > >> > > > > assume
> > > >> > > > > >>> > that the active controller will guarantee that all
> > records
> > > >> > after
> > > >> > > > the
> > > >> > > > > >>> > FatureLevelRecord use the downgraded version. If so,
> it
> > > >> would
> > > >> > be
> > > >> > > > good
> > > >> > > > > >>> > to mention that explicitly.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> Similar to above, yes a broker that detects a downgrade
> > via
> > > >> > > > > >>> FeatureLevelRecord could generate its own downgrade
> > snapshot
> > > >> and
> > > >> > > > reload
> > > >> > > > > >>> its
> > > >> > > > > >>> state from that. This does get a little fuzzy when we
> > > consider
> > > >> > > cases
> > > >> > > > > where
> > > >> > > > > >>> brokers are on different software versions and could be
> > > >> > generating
> > > >> > > a
> > > >> > > > > >>> downgrade snapshot for version X, but using different
> > > >> versions of
> > > >> > > the
> > > >> > > > > >>> code.
> > > >> > > > > >>> It might be safer to let the controller generate the
> > > snapshot
> > > >> so
> > > >> > > each
> > > >> > > > > >>> broker (regardless of software version) gets the same
> > > records.
> > > >> > > > However,
> > > >> > > > > >>> for
> > > >> > > > > >>> upgrades (or downgrades) we expect the whole cluster to
> be
> > > >> > running
> > > >> > > > the
> > > >> > > > > >>> same
> > > >> > > > > >>> software version before triggering the metadata.version
> > > >> change,
> > > >> > so
> > > >> > > > > perhaps
> > > >> > > > > >>> this isn't a likely scenario. Thoughts?
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> 3. Max metadata version
> > > >> > > > > >>> > >For the first release that supports metadata.version,
> > we
> > > >> can
> > > >> > > > simply
> > > >> > > > > >>> > initialize metadata.version with the current (and
> only)
> > > >> > version.
> > > >> > > > For
> > > >> > > > > >>> future
> > > >> > > > > >>> > releases, we will need a mechanism to bootstrap a
> > > particular
> > > >> > > > version.
> > > >> > > > > >>> This
> > > >> > > > > >>> > could be done using the meta.properties file or some
> > > similar
> > > >> > > > > mechanism.
> > > >> > > > > >>> The
> > > >> > > > > >>> > reason we need the allow for a specific initial
> version
> > is
> > > >> to
> > > >> > > > support
> > > >> > > > > >>> the
> > > >> > > > > >>> > use case of starting a Kafka cluster at version X with
> > an
> > > >> older
> > > >> > > > > >>> > metadata.version.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> I assume that the Active Controller will learn the
> > metadata
> > > >> > version
> > > >> > > > of
> > > >> > > > > >>> > the broker through the BrokerRegistrationRequest. How
> > will
> > > >> the
> > > >> > > > Active
> > > >> > > > > >>> > Controller learn about the max metadata version of the
> > > >> inactive
> > > >> > > > > >>> > controller nodes? We currently don't send a
> registration
> > > >> > request
> > > >> > > > from
> > > >> > > > > >>> > the inactive controller to the active controller.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> This came up during the design, but I neglected to add
> it
> > to
> > > >> the
> > > >> > > KIP.
> > > >> > > > > We
> > > >> > > > > >>> will need a mechanism for determining the supported
> > features
> > > >> of
> > > >> > > each
> > > >> > > > > >>> controller similar to how brokers use
> > > >> BrokerRegistrationRequest.
> > > >> > > > > Perhaps
> > > >> > > > > >>> controllers could write a FeatureLevelRecord (or
> similar)
> > to
> > > >> the
> > > >> > > > > metadata
> > > >> > > > > >>> log indicating their supported version. WDYT?
> > > >> > > > > >>>
> > > >> > > > > >>> Why do you need to bootstrap a particular version? Isn't
> > the
> > > >> > intent
> > > >> > > > > >>> > that the broker will learn the active metadata version
> > by
> > > >> > reading
> > > >> > > > the
> > > >> > > > > >>> > metadata before unfencing?
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> This bootstrapping is needed for when a KRaft cluster is
> > > first
> > > >> > > > > started. If
> > > >> > > > > >>> we don't have this mechanism, the cluster can't really
> do
> > > >> > anything
> > > >> > > > > until
> > > >> > > > > >>> the operator finalizes the metadata.version with the
> tool.
> > > The
> > > >> > > > > >>> bootstrapping will be done by the controller and the
> > brokers
> > > >> will
> > > >> > > see
> > > >> > > > > this
> > > >> > > > > >>> version as a record (like you say). I'll add some text
> to
> > > >> clarify
> > > >> > > > this.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> 4. Reject Registration - This is related to the bullet
> > point
> > > >> > above.
> > > >> > > > > >>> > What will be the behavior of the active controller if
> > the
> > > >> > broker
> > > >> > > > > sends
> > > >> > > > > >>> > a metadata version that is not compatible with the
> > cluster
> > > >> wide
> > > >> > > > > >>> > metadata version?
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> If a broker starts up with a lower supported version
> range
> > > >> than
> > > >> > the
> > > >> > > > > >>> current
> > > >> > > > > >>> cluster metadata.version, it should log an error and
> > > shutdown.
> > > >> > This
> > > >> > > > is
> > > >> > > > > in
> > > >> > > > > >>> line with KIP-584.
> > > >> > > > > >>>
> > > >> > > > > >>> 5. Discover upgrade
> > > >> > > > > >>> > > This snapshot will be a convenient way to let broker
> > and
> > > >> > > > controller
> > > >> > > > > >>> > components rebuild their entire in-memory state
> > following
> > > an
> > > >> > > > upgrade.
> > > >> > > > > >>> > Can we rely on the presence of the FeatureLevelRecord
> > for
> > > >> the
> > > >> > > > > metadata
> > > >> > > > > >>> > version for this functionality? If so, it avoids
> having
> > to
> > > >> > reload
> > > >> > > > the
> > > >> > > > > >>> > snapshot.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> For upgrades, yes probably since we won't need to
> > "rewrite"
> > > >> any
> > > >> > > > > records in
> > > >> > > > > >>> this case. For downgrades, we will need to generate the
> > > >> snapshot
> > > >> > > and
> > > >> > > > > >>> reload
> > > >> > > > > >>> everything.
> > > >> > > > > >>>
> > > >> > > > > >>> 6. Metadata version specification
> > > >> > > > > >>> > >  V4(version=4, isBackwardsCompatible=false,
> > > >> description="New
> > > >> > > > > metadata
> > > >> > > > > >>> > record type Bar"),
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> Very cool. Do you have plans to generate Apache Kafka
> HTML
> > > >> > > > > >>> > documentation for this information? Would be helpful
> to
> > > >> display
> > > >> > > > this
> > > >> > > > > >>> > information to the user using the kafka-features.sh
> and
> > > >> feature
> > > >> > > > RPC?
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> Hm good idea :) I'll add a brief section on
> documentation.
> > > >> This
> > > >> > > would
> > > >> > > > > >>> certainly be very useful
> > > >> > > > > >>>
> > > >> > > > > >>> 7.Downgrade records
> > > >> > > > > >>> > I think we should explicitly mention that the
> downgrade
> > > >> process
> > > >> > > > will
> > > >> > > > > >>> > downgrade both metadata records and controller
> records.
> > In
> > > >> > > KIP-630
> > > >> > > > we
> > > >> > > > > >>> > introduced two control records for snapshots.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> Yes, good call. Let me re-read that KIP and include some
> > > >> details.
> > > >> > > > > >>>
> > > >> > > > > >>> Thanks again for the comments!
> > > >> > > > > >>> -David
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García
> Sancio
> > > >> > > > > >>> <jsan...@confluent.io.invalid> wrote:
> > > >> > > > > >>>
> > > >> > > > > >>> > One more comment.
> > > >> > > > > >>> >
> > > >> > > > > >>> > 7.Downgrade records
> > > >> > > > > >>> > I think we should explicitly mention that the
> downgrade
> > > >> process
> > > >> > > > will
> > > >> > > > > >>> > downgrade both metadata records and controller
> records.
> > In
> > > >> > > KIP-630
> > > >> > > > we
> > > >> > > > > >>> > introduced two control records for snapshots.
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > -David
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> > >
> > > --
> > > -David
> > >
> >
> >
> > --
> > -- Guozhang
> >
>
>
> --
> -David
>


-- 
-- Guozhang

Reply via email to