Jun,

The KIP has some changes to the CLI for KIP-584. With Jason's suggestion
incorporated, these three commands would look like:

1) kafka-features.sh upgrade --release latest
upgrades all known features to their defaults in the latest release

2) kafka-features.sh downgrade --release 3.x
downgrade all known features to the default versions from 3.x

3) kafka-features.sh describe --release latest
Describe the known features of the latest release

The --upgrade-all and --downgrade-all are replaced by specific each
feature+version or giving the --release argument. One problem with
--downgrade-all is it's not clear what the target versions should be: the
previous version before the last upgrade -- or the lowest supported
version. Since downgrades will be less common, I think it's better to make
the operator be more explicit about the desired downgrade version (either
through [--feature X --version Y] or [--release 3.1]). Does that seem
reasonable?

16. For case C, I think we will want to always use the latest
metadata.version for new clusters (like we do for IBP). We can always
change what we mean by "default" down the road. Also, this will likely
become a bit of information we include in release and upgrade notes with
each release.

-David

On Thu, Nov 18, 2021 at 1:14 PM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Jason, David,
>
> Just to clarify on the interaction with the end user, the design in KIP-584
> allows one to do the following.
> (1) kafka-features.sh  --upgrade-all --bootstrap-server
> kafka-broker0.prn1:9071 to upgrade all features to the latest version known
> by the tool. The user doesn't need to know a specific feature version.
> (2) kafka-features.sh  --downgrade-all --bootstrap-server
> kafka-broker0.prn1:9071 to downgrade all features to the latest version
> known by the tool. The user doesn't need to know a specific feature
> version.
> (3) kafka-features.sh  --describe --bootstrap-server
> kafka-broker0.prn1:9071 to find out the supported version for each feature.
> This allows the user to upgrade each feature individually.
>
> Most users will be doing (1) and occasionally (2), and won't need to know
> the exact version of each feature.
>
> 16. For case C, what's the default version? Is it always the latest?
>
> Thanks,
>
> Jun
>
>
> On Thu, Nov 18, 2021 at 8:15 AM David Arthur
> <david.art...@confluent.io.invalid> wrote:
>
> > Colin, thanks for the detailed response. I understand what you're saying
> > and I agree with your rationale.
> >
> > It seems like we could just initialize their cluster.metadata to 1 when
> the
> > > software is upgraded to 3.2.
> > >
> >
> > Concretely, this means the controller would see that there is no
> > FeatureLevelRecord in the log, and so it would bootstrap one. Normally
> for
> > cluster initialization, this would be read from meta.properties, but in
> the
> > case of preview clusters we would need to special case it to initialize
> the
> > version to 1.
> >
> > Once the new FeatureLevelRecord has been committed, any nodes (brokers or
> > controllers) that are running the latest software will react to the new
> > metadata.version. This means we will need to make this initial version
> of 1
> > be backwards compatible to what we have in 3.0 and 3.1 (since some
> brokers
> > will be on the new software and some on older/preview versions)
> >
> > I agree that auto-upgrading preview users from IBP 3.0 to
> metadata.version
> > 1 (equivalent to IBP 3.2) is probably fine.
> >
> > Back to Jun's case B:
> >
> > b. We upgrade from an old version where no metadata.version has been
> > > finalized. In this case, it makes sense to leave metadata.version
> > disabled
> > > since we don't know if all brokers have been upgraded.
> >
> >
> > Instead of leaving it uninitialized, we would initialize it to 1 which
> > would be backwards compatible to 3.0 and 3.1. This would eliminate the
> need
> > to check a "final IBP" or deal with 3.2+ clusters without a
> > metadata.version set. The downgrade path for 3.2 back to a preview
> release
> > should be fine since we are saying that metadata.version 1 is compatible
> > with those releases. The FeatureLevelRecord would exist, but it would be
> > ignored (we might need to make sure this actually works in 3.0).
> >
> > For clarity, here are the three workflows of Jun's three cases:
> >
> > A (KRaft 3.2+ to KRaft 3.2+):
> > * User initializes cluster with an explicit metadata.version X, or the
> tool
> > selects the default
> > * After upgrade, cluster stays at version X until operator upgrades to Y
> >
> > B (KRaft Preview to KRaft 3.2+):
> > * User initializes cluster without metadata.version
> > * After controller leader is upgraded, initializes metadata.version to 1
> > * After the cluster is upgraded, an operator may further upgrade to
> version
> > Y
> >
> > C (New KRaft 3.2+):
> > * User initializes cluster with an explicit metadata.version X, or the
> tool
> > selects the default
> >
> >
> > This has been mentioned in this thread, but we should explicitly call out
> > that the absence of metadata.version in meta.properties will be used
> > to identify that we are in case B. After 3.2, we will require that
> > metadata.version is present in meta.properties for new clusters. If users
> > omit the --metadata-version flag from kafka-storage.sh, we should fill
> in a
> > default.
> >
> > So how about the following changes/clarifications to the KIP:
> >
> > * Starting with 3.2, metadata.version is required in KRaft, IBP is
> ignored
> > * If meta.properties does not include metadata.version, it indicates this
> > cluster was initialized with a preview release
> > * If upgrading from a preview release to 3.2+, the controller will
> > initialize metadata.version=1
> > * metadata.version=1 is backwards compatible with preview releases
> >
> > -David
> >
> >
> > On Thu, Nov 18, 2021 at 10:02 AM David Arthur <david.art...@confluent.io
> >
> > wrote:
> >
> > > Jason,
> > >
> > > 1/2. You've got it right. The intention is that metadata.version will
> > gate
> > > both RPCs (like IBP does) as well as metadata records. So, when a
> broker
> > > sees that metadata.version changed, it may start advertising new RPCs
> and
> > > it will need to refresh the ApiVersions it has for other brokers. I can
> > try
> > > to make this more explicit in the KIP.
> > >
> > > 3. Yes, that's the intention of the --dry-run flag. The current
> > > implementation in kafka-features.sh does a dry run on the client side,
> > but
> > > this KIP pushes the validation down to the controller. This will allow
> us
> > > to have the context needed to do proper validation
> > >
> > > Re: version number complexity -- yes this has come up in offline
> > > discussions. With just one feature flag to manage, it's not so bad, but
> > > explicitly managing even a few would be a burden. I like your
> suggestion
> > of
> > > the "--release" flag. That could act as a sort of manifest of versions
> > > (i.e., the defaults from that release). We could also support something
> > > like "kafka-features upgrade --release latest" to bring everything to
> the
> > > highest supported version.
> > >
> > >
> > >
> > > On Wed, Nov 17, 2021 at 9:36 PM Jason Gustafson
> > <ja...@confluent.io.invalid>
> > > wrote:
> > >
> > >> Hi Colin/David,
> > >>
> > >> > Like David said, basically it boils down to creating a feature flag
> > for
> > >> the new proposed __consumer_offsets version, or using a new
> > >> IBP/metadata.version for it. Both approaches have pros and cons. Using
> > an
> > >> IBP/metadata.version bump reduces the size of the testing matrix. But
> > >> using
> > >> a feature flag allows people to avoid any bugs or pain associated with
> > the
> > >> change if they don't care about enabling it. This is basically the
> > classic
> > >> "should I use a feature flag or not?" discussion and we need to have
> it
> > on
> > >> a case-by-case basis.
> > >>
> > >> I think most users are not going to care to manage versions for a
> bunch
> > of
> > >> different features. The IBP today has many shortcomings, but at least
> > it's
> > >> tied to a version that users understand (i.e. the release version).
> How
> > >> would users know after upgrading to Kafka 3.1, for example, that they
> > need
> > >> to upgrade the metadata.version to 3  and offsets.version to 4 (or
> > >> whatever)? It's a lot of overhead trying to understand all of the
> > >> potential
> > >> features and what each upgrade actually means to them. I am wondering
> if
> > >> we
> > >> could give them something more convenient which is tied to the release
> > >> version. For example, maybe we could use a command like
> `kafka-features
> > >> upgrade --release 3.1`, which the broker would then translate to an
> > >> upgrade
> > >> to the latest versions of the individual features at the time of the
> 3.1
> > >> release. Basically it's just a static map from release version to
> > feature
> > >> versions to make the upgrade process more convenient.
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Nov 17, 2021 at 6:20 PM Jason Gustafson <ja...@confluent.io>
> > >> wrote:
> > >>
> > >> > A few additional questions:
> > >> >
> > >> > 1. Currently the IBP tells us what version of individual
> inter-broker
> > >> RPCs
> > >> > will be used. I think the plan in this KIP is to use ApiVersions
> > request
> > >> > instead to find the highest compatible version (just like clients).
> > Do I
> > >> > have that right?
> > >> >
> > >> > 2. The following wasn't very clear to me:
> > >> >
> > >> > > Brokers will be able to observe changes to metadata.version by
> > >> observing
> > >> > the metadata log, and could then submit a new ApiVersionsRequest to
> > the
> > >> > other Kafka nodes.
> > >> >
> > >> > Is the purpose of submitting new ApiVersions requests to update the
> > >> > features or the RPC versions? Does metadata.version also influence
> the
> > >> > versions that a broker advertises? It would help to have more detail
> > >> about
> > >> > this.
> > >> >
> > >> > 3. I imagine users will want to know before performing an upgrade
> > >> whether
> > >> > downgrading will be safe. Would the --dry-run flag tell them this?
> > >> >
> > >> > Thanks,
> > >> > Jason
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Nov 17, 2021 at 3:55 PM Colin McCabe <cmcc...@apache.org>
> > >> wrote:
> > >> >
> > >> >> On Wed, Nov 17, 2021, at 11:28, Jason Gustafson wrote:
> > >> >> > Hi David,
> > >> >> >
> > >> >> > Forgive me if this ground has been covered already. Today, we
> have
> > a
> > >> few
> > >> >> > other things that we have latched onto the IBP, such as upgrades
> to
> > >> the
> > >> >> > format of records in __consumer_offsets. I've been assuming that
> > >> >> > metadata.version is not covering this. Is that right or is there
> > some
> > >> >> other
> > >> >> > plan to take care of cases like this?
> > >> >> >
> > >> >>
> > >> >> I think metadata.version could cover changes to things like
> > >> >> __consumer_offsets, if people want it to. Or to put it another way,
> > >> that is
> > >> >> out of scope for this KIP.
> > >> >>
> > >> >> Like David said, basically it boils down to creating a feature flag
> > for
> > >> >> the new proposed __consumer_offsets version, or using a new
> > >> >> IBP/metadata.version for it. Both approaches have pros and cons.
> > Using
> > >> an
> > >> >> IBP/metadata.version bump reduces the size of the testing matrix.
> But
> > >> using
> > >> >> a feature flag allows people to avoid any bugs or pain associated
> > with
> > >> the
> > >> >> change if they don't care about enabling it. This is basically the
> > >> classic
> > >> >> "should I use a feature flag or not?" discussion and we need to
> have
> > >> it on
> > >> >> a case-by-case basis.
> > >> >>
> > >> >> I think it's worth calling out that having a 1:1 mapping between
> IBP
> > >> >> versions and metadata.versions will result in some
> metadata.versions
> > >> that
> > >> >> "don't do anything" (aka they do the same thing as the previous
> > >> >> metadata.version). For example, if we change StopReplicaRequest
> > again,
> > >> that
> > >> >> will not affect KRaft mode, but probably would require an IBP bump
> > and
> > >> >> hence a metadata.version bump. I think that's OK. It's not that
> > >> different
> > >> >> from updating your IBP and getting support for ZStandard, when your
> > >> >> deployment doesn't use ZStandard compression.
> > >> >>
> > >> >> best,
> > >> >> Colin
> > >> >>
> > >> >> > Thanks,
> > >> >> > Jason
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Nov 17, 2021 at 10:17 AM Jun Rao
> <j...@confluent.io.invalid
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> >> Hi, Colin,
> > >> >> >>
> > >> >> >> Thanks for the reply.
> > >> >> >>
> > >> >> >> For case b, I am not sure that I understand your suggestion.
> Does
> > >> "each
> > >> >> >> subsequent level for metadata.version corresponds to an IBP
> > version"
> > >> >> mean
> > >> >> >> that we need to keep IBP forever? Could you describe the upgrade
> > >> >> process in
> > >> >> >> this case?
> > >> >> >>
> > >> >> >> Jun
> > >> >> >>
> > >> >> >> On Tue, Nov 16, 2021 at 3:45 PM Colin McCabe <
> cmcc...@apache.org>
> > >> >> wrote:
> > >> >> >>
> > >> >> >> > On Tue, Nov 16, 2021, at 15:13, Jun Rao wrote:
> > >> >> >> > > Hi, David, Colin,
> > >> >> >> > >
> > >> >> >> > > Thanks for the reply.
> > >> >> >> > >
> > >> >> >> > > 16. Discussed with David offline a bit. We have 3 cases.
> > >> >> >> > > a. We upgrade from an old version where the metadata.version
> > has
> > >> >> >> already
> > >> >> >> > > been finalized. In this case it makes sense to stay with
> that
> > >> >> feature
> > >> >> >> > > version after the upgrade.
> > >> >> >> >
> > >> >> >> > +1
> > >> >> >> >
> > >> >> >> > > b. We upgrade from an old version where no metadata.version
> > has
> > >> >> been
> > >> >> >> > > finalized. In this case, it makes sense to leave
> > >> metadata.version
> > >> >> >> > disabled
> > >> >> >> > > since we don't know if all brokers have been upgraded.
> > >> >> >> >
> > >> >> >> > This is the scenario I was hoping to avoid by saying that ALL
> > >> KRaft
> > >> >> >> > clusters have metadata.version of at least 1, and each
> > subsequent
> > >> >> level
> > >> >> >> for
> > >> >> >> > metadata.version corresponds to an IBP version. The existing
> > KRaft
> > >> >> >> clusters
> > >> >> >> > in 3.0 and earlier are preview (not for production) so I think
> > >> this
> > >> >> >> change
> > >> >> >> > is OK for 3.x (given that it affects only KRaft). Then IBP is
> > >> >> irrelevant
> > >> >> >> > for KRaft clusters (the config is ignored, possibly with a
> WARN
> > or
> > >> >> ERROR
> > >> >> >> > message generated if it is set).
> > >> >> >> >
> > >> >> >> > > c. We are starting from a brand new cluster and of course no
> > >> >> >> > > metadata.version has been finalized. In this case, the KIP
> > says
> > >> it
> > >> >> will
> > >> >> >> > > pick the metadata.version in meta.properties. In the common
> > >> case,
> > >> >> >> people
> > >> >> >> > > probably won't set the metadata.version in the
> meta.properties
> > >> file
> > >> >> >> > > explicitly. So, it will be useful to put a default (stable)
> > >> version
> > >> >> >> there
> > >> >> >> > > when the meta.properties.
> > >> >> >> >
> > >> >> >> > Hmm. I was assuming that clusters where the admin didn't
> specify
> > >> any
> > >> >> >> > metadata.version during formatting would get the latest
> > >> >> metadata.version.
> > >> >> >> > Partly, because this is what we do for IBP today. It would be
> > >> good to
> > >> >> >> > clarify this...
> > >> >> >> >
> > >> >> >> > >
> > >> >> >> > > Also, it would be useful to clarify that if a
> > FeatureLevelRecord
> > >> >> exists
> > >> >> >> > for
> > >> >> >> > > metadata.version, the metadata.version in meta.properties
> will
> > >> be
> > >> >> >> > ignored.
> > >> >> >> > >
> > >> >> >> >
> > >> >> >> > Yeah, I agree.
> > >> >> >> >
> > >> >> >> > best,
> > >> >> >> > Colin
> > >> >> >> >
> > >> >> >> > > Thanks,
> > >> >> >> > >
> > >> >> >> > > Jun
> > >> >> >> > >
> > >> >> >> > >
> > >> >> >> > > On Tue, Nov 16, 2021 at 12:39 PM Colin McCabe <
> > >> cmcc...@apache.org>
> > >> >> >> > wrote:
> > >> >> >> > >
> > >> >> >> > >> On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> > >> >> >> > >> > Hi, David,
> > >> >> >> > >> >
> > >> >> >> > >> > Thanks for the reply.
> > >> >> >> > >> >
> > >> >> >> > >> > 16. My first concern is that the KIP picks up
> meta.version
> > >> >> >> > inconsistently
> > >> >> >> > >> > during the deployment. If a new cluster is started, we
> pick
> > >> up
> > >> >> the
> > >> >> >> > >> highest
> > >> >> >> > >> > version. If we upgrade, we leave the feature version
> > >> unchanged.
> > >> >> >> > >>
> > >> >> >> > >> Hi Jun,
> > >> >> >> > >>
> > >> >> >> > >> Thanks again for taking a look.
> > >> >> >> > >>
> > >> >> >> > >> The proposed behavior in KIP-778 is consistent with how it
> > >> works
> > >> >> >> today.
> > >> >> >> > >> Upgrading the software is distinct from upgrading the IBP.
> > >> >> >> > >>
> > >> >> >> > >> I think it is important to keep these two operations
> > >> ("upgrading
> > >> >> >> > >> IBP/metadata version" and "upgrading software version")
> > >> separate.
> > >> >> If
> > >> >> >> > they
> > >> >> >> > >> are coupled it will create a situation where software
> > upgrades
> > >> are
> > >> >> >> > >> difficult and dangerous.
> > >> >> >> > >>
> > >> >> >> > >> Consider a situation where you find some bug in your
> current
> > >> >> software,
> > >> >> >> > and
> > >> >> >> > >> you want to upgrade to new software that fixes the bug. If
> > >> >> upgrades
> > >> >> >> and
> > >> >> >> > IBP
> > >> >> >> > >> bumps are coupled, you can't do this without also bumping
> the
> > >> IBP,
> > >> >> >> > which is
> > >> >> >> > >> usually considered a high-risk change. That means that
> either
> > >> you
> > >> >> have
> > >> >> >> > to
> > >> >> >> > >> make a special build that includes only the fix
> > (time-consuming
> > >> >> and
> > >> >> >> > >> error-prone), live with the bug for longer, or be very
> > >> >> conservative
> > >> >> >> > about
> > >> >> >> > >> ever introducing new IBP/metadata versions. None of those
> are
> > >> >> really
> > >> >> >> > good
> > >> >> >> > >> choices.
> > >> >> >> > >>
> > >> >> >> > >> > Intuitively, it seems that independent of how a cluster
> is
> > >> >> deployed,
> > >> >> >> > we
> > >> >> >> > >> > should always pick the same feature version.
> > >> >> >> > >>
> > >> >> >> > >> I think it makes sense to draw a distinction between
> > upgrading
> > >> an
> > >> >> >> > existing
> > >> >> >> > >> cluster and deploying a new one. What most people want out
> of
> > >> >> upgrades
> > >> >> >> > is
> > >> >> >> > >> that things should keep working, but with bug fixes. If we
> > >> change
> > >> >> >> that,
> > >> >> >> > it
> > >> >> >> > >> just makes people more reluctant to upgrade (which is
> always
> > a
> > >> >> >> > problem...)
> > >> >> >> > >>
> > >> >> >> > >> > I think we need to think this through in this KIP. My
> > second
> > >> >> concern
> > >> >> >> > is
> > >> >> >> > >> > that as a particular version matures, it's inconvenient
> > for a
> > >> >> user
> > >> >> >> to
> > >> >> >> > >> manually
> > >> >> >> > >> > upgrade every feature version. As long as we have a path
> to
> > >> >> achieve
> > >> >> >> > that
> > >> >> >> > >> in
> > >> >> >> > >> > the future, we don't need to address that in this KIP.
> > >> >> >> > >>
> > >> >> >> > >> If people are managing a large number of Kafka clusters,
> they
> > >> will
> > >> >> >> want
> > >> >> >> > to
> > >> >> >> > >> do some sort of A/B testing with IBP/metadata versions. So
> if
> > >> you
> > >> >> have
> > >> >> >> > 1000
> > >> >> >> > >> Kafka clusters, you roll out the new IBP version to 10 of
> > them
> > >> >> and see
> > >> >> >> > how
> > >> >> >> > >> it goes. If that goes well, you roll it out to more, etc.
> > >> >> >> > >>
> > >> >> >> > >> So, the automation needs to be at the cluster management
> > layer,
> > >> >> not at
> > >> >> >> > the
> > >> >> >> > >> Kafka layer. Each Kafka cluster doesn't know how well
> things
> > >> went
> > >> >> in
> > >> >> >> the
> > >> >> >> > >> other 999 clusters. Automatically upgrading is a bad thing
> > for
> > >> the
> > >> >> >> same
> > >> >> >> > >> reason Kafka automatically upgrading its own software
> version
> > >> >> would
> > >> >> >> be a
> > >> >> >> > >> bad thing -- it could lead to a disruption to a sensitive
> > >> cluster
> > >> >> at
> > >> >> >> the
> > >> >> >> > >> wrong time.
> > >> >> >> > >>
> > >> >> >> > >> For people who are just managing one or two Kafka clusters,
> > >> >> >> > automatically
> > >> >> >> > >> upgrading feature versions isn't a big burden and can be
> done
> > >> >> >> manually.
> > >> >> >> > >> This is all consistent with how IBP works today.
> > >> >> >> > >>
> > >> >> >> > >> Also, we already have a command-line option to the feature
> > tool
> > >> >> which
> > >> >> >> > >> upgrades all features to the latest available, if that is
> > what
> > >> the
> > >> >> >> > >> administrator desires. Perhaps we could add documentation
> > >> saying
> > >> >> that
> > >> >> >> > this
> > >> >> >> > >> should be done as the last step of the upgrade.
> > >> >> >> > >>
> > >> >> >> > >> best,
> > >> >> >> > >> Colin
> > >> >> >> > >>
> > >> >> >> > >> >
> > >> >> >> > >> > 21. "./kafka-features.sh delete": Deleting a feature
> seems
> > a
> > >> bit
> > >> >> >> weird
> > >> >> >> > >> > since the logic is always there. Would it be better to
> use
> > >> >> disable?
> > >> >> >> > >> >
> > >> >> >> > >> > Jun
> > >> >> >> > >> >
> > >> >> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > >> >> >> > >> > <david.art...@confluent.io.invalid> wrote:
> > >> >> >> > >> >
> > >> >> >> > >> >> Colin and Jun, thanks for the additional comments!
> > >> >> >> > >> >>
> > >> >> >> > >> >> Colin:
> > >> >> >> > >> >>
> > >> >> >> > >> >> > We've been talking about having an automated RPC
> > >> >> compatibility
> > >> >> >> > checker
> > >> >> >> > >> >>
> > >> >> >> > >> >> Do we have a way to mark fields in schemas as
> deprecated?
> > It
> > >> >> can
> > >> >> >> > stay in
> > >> >> >> > >> >> the RPC, it just complicates the logic a bit.
> > >> >> >> > >> >>
> > >> >> >> > >> >> > It would be nice if the active controller could
> validate
> > >> >> that a
> > >> >> >> > >> majority
> > >> >> >> > >> >> of the quorum could use the proposed metadata.version.
> The
> > >> >> active
> > >> >> >> > >> >> controller should have this information, right? If we
> > don't
> > >> >> have
> > >> >> >> > recent
> > >> >> >> > >> >> information  from a quorum of voters, we wouldn't be
> > active.
> > >> >> >> > >> >>
> > >> >> >> > >> >> I believe we should have this information from the
> > >> >> >> > ApiVersionsResponse.
> > >> >> >> > >> It
> > >> >> >> > >> >> would be good to do this validation to avoid a situation
> > >> where
> > >> >> a
> > >> >> >> > >> >> quorum leader can't be elected due to unprocessable
> > records.
> > >> >> >> > >> >>
> > >> >> >> > >> >> > Do we need delete as a command separate from
> downgrade?
> > >> >> >> > >> >>
> > >> >> >> > >> >> I think from an operator's perspective, it is nice to
> > >> >> distinguish
> > >> >> >> > >> between
> > >> >> >> > >> >> changing a feature flag and unsetting it. It might be
> > >> >> surprising to
> > >> >> >> > an
> > >> >> >> > >> >> operator to see the flag's version set to nothing when
> > they
> > >> >> >> requested
> > >> >> >> > >> the
> > >> >> >> > >> >> downgrade to version 0 (or less).
> > >> >> >> > >> >>
> > >> >> >> > >> >> > it seems like we should spell out that
> metadata.version
> > >> >> begins at
> > >> >> >> > 1 in
> > >> >> >> > >> >> KRaft clusters
> > >> >> >> > >> >>
> > >> >> >> > >> >> I added this text:
> > >> >> >> > >> >>
> > >> >> >> > >> >> Introduce an IBP version to indicate the lowest software
> > >> >> version
> > >> >> >> that
> > >> >> >> > >> >> > supports *metadata.version*. Below this IBP, the
> > >> >> >> > *metadata.version* is
> > >> >> >> > >> >> > undefined and will not be examined. At or above this
> > IBP,
> > >> the
> > >> >> >> > >> >> > *metadata.version* must be *0* for ZooKeeper clusters
> > and
> > >> >> will be
> > >> >> >> > >> >> > initialized as *1* for KRaft clusters.
> > >> >> >> > >> >>
> > >> >> >> > >> >>
> > >> >> >> > >> >> > We probably also want an RPC implemented by both
> brokers
> > >> and
> > >> >> >> > >> controllers
> > >> >> >> > >> >> that will reveal the min and max supported versions for
> > each
> > >> >> >> feature
> > >> >> >> > >> level
> > >> >> >> > >> >> supported by the server
> > >> >> >> > >> >>
> > >> >> >> > >> >> This is available in ApiVersionsResponse (we include the
> > >> >> server's
> > >> >> >> > >> supported
> > >> >> >> > >> >> features as well as the cluster's finalized features)
> > >> >> >> > >> >>
> > >> >> >> > >> >> --------
> > >> >> >> > >> >>
> > >> >> >> > >> >> Jun:
> > >> >> >> > >> >>
> > >> >> >> > >> >> 12. I've updated the KIP with AdminClient changes
> > >> >> >> > >> >>
> > >> >> >> > >> >> 14. You're right, it looks like I missed a few sections
> > >> >> regarding
> > >> >> >> > >> snapshot
> > >> >> >> > >> >> generation. I've corrected it
> > >> >> >> > >> >>
> > >> >> >> > >> >> 16. This feels more like an enhancement to KIP-584. I
> > agree
> > >> it
> > >> >> >> could
> > >> >> >> > be
> > >> >> >> > >> >> useful, but perhaps we could address it separately from
> > >> KRaft
> > >> >> >> > upgrades?
> > >> >> >> > >> >>
> > >> >> >> > >> >> 20. Indeed snapshots are not strictly necessary during
> an
> > >> >> upgrade,
> > >> >> >> > I've
> > >> >> >> > >> >> reworded this
> > >> >> >> > >> >>
> > >> >> >> > >> >>
> > >> >> >> > >> >> Thanks!
> > >> >> >> > >> >> David
> > >> >> >> > >> >>
> > >> >> >> > >> >>
> > >> >> >> > >> >> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> > >> >> <j...@confluent.io.invalid>
> > >> >> >> > >> wrote:
> > >> >> >> > >> >>
> > >> >> >> > >> >> > Hi, David, Jose and Colin,
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > Thanks for the reply. A few more comments.
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > 12. It seems that we haven't updated the AdminClient
> > >> >> accordingly?
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > 14. "Metadata snapshot is generated and sent to the
> > other
> > >> >> >> inactive
> > >> >> >> > >> >> > controllers and to brokers". I thought we wanted each
> > >> broker
> > >> >> to
> > >> >> >> > >> generate
> > >> >> >> > >> >> > its own snapshot independently? If only the controller
> > >> >> generates
> > >> >> >> > the
> > >> >> >> > >> >> > snapshot, how do we force other brokers to pick it up?
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > 16. If a feature version is new, one may not want to
> > >> enable
> > >> >> it
> > >> >> >> > >> >> immediately
> > >> >> >> > >> >> > after the cluster is upgraded. However, if a feature
> > >> version
> > >> >> has
> > >> >> >> > been
> > >> >> >> > >> >> > stable, requiring every user to run a command to
> upgrade
> > >> to
> > >> >> that
> > >> >> >> > >> version
> > >> >> >> > >> >> > seems inconvenient. One way to improve this is for
> each
> > >> >> feature
> > >> >> >> to
> > >> >> >> > >> define
> > >> >> >> > >> >> > one version as the default. Then, when we upgrade a
> > >> cluster,
> > >> >> we
> > >> >> >> > will
> > >> >> >> > >> >> > automatically upgrade the feature to the default
> > version.
> > >> An
> > >> >> >> admin
> > >> >> >> > >> could
> > >> >> >> > >> >> > use the tool to upgrade to a version higher than the
> > >> default.
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > 20. "The quorum controller can assist with this
> process
> > by
> > >> >> >> > generating
> > >> >> >> > >> a
> > >> >> >> > >> >> > metadata snapshot after a metadata.version increase
> has
> > >> been
> > >> >> >> > >> committed to
> > >> >> >> > >> >> > the metadata log. This snapshot will be a convenient
> way
> > >> to
> > >> >> let
> > >> >> >> > broker
> > >> >> >> > >> >> and
> > >> >> >> > >> >> > controller components rebuild their entire in-memory
> > state
> > >> >> >> > following
> > >> >> >> > >> an
> > >> >> >> > >> >> > upgrade." The new version of the software could read
> > both
> > >> >> the new
> > >> >> >> > and
> > >> >> >> > >> the
> > >> >> >> > >> >> > old version. Is generating a new snapshot during
> upgrade
> > >> >> needed?
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > Jun
> > >> >> >> > >> >> >
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <
> > >> >> cmcc...@apache.org>
> > >> >> >> > >> wrote:
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
> > >> >> >> > >> >> > > > Hi, David,
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > > One more comment.
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > > 16. The main reason why KIP-584 requires
> finalizing
> > a
> > >> >> feature
> > >> >> >> > >> >> manually
> > >> >> >> > >> >> > is
> > >> >> >> > >> >> > > > that in the ZK world, the controller doesn't know
> > all
> > >> >> brokers
> > >> >> >> > in a
> > >> >> >> > >> >> > > cluster.
> > >> >> >> > >> >> > > > A broker temporarily down is not registered in ZK.
> > in
> > >> the
> > >> >> >> KRaft
> > >> >> >> > >> >> world,
> > >> >> >> > >> >> > > the
> > >> >> >> > >> >> > > > controller keeps track of all brokers, including
> > those
> > >> >> that
> > >> >> >> are
> > >> >> >> > >> >> > > temporarily
> > >> >> >> > >> >> > > > down. This makes it possible for the controller to
> > >> >> >> > automatically
> > >> >> >> > >> >> > > finalize a
> > >> >> >> > >> >> > > > feature---it's safe to do so when all brokers
> > support
> > >> >> that
> > >> >> >> > >> feature.
> > >> >> >> > >> >> > This
> > >> >> >> > >> >> > > > will make the upgrade process much simpler since
> no
> > >> >> manual
> > >> >> >> > >> command is
> > >> >> >> > >> >> > > > required to turn on a new feature. Have we
> > considered
> > >> >> this?
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > > Thanks,
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > > Jun
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > Hi Jun,
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > I guess David commented on this point already, but
> > I'll
> > >> >> comment
> > >> >> >> > as
> > >> >> >> > >> >> well.
> > >> >> >> > >> >> > I
> > >> >> >> > >> >> > > always had the perception that users viewed rolls as
> > >> >> >> potentially
> > >> >> >> > >> risky
> > >> >> >> > >> >> > and
> > >> >> >> > >> >> > > were looking for ways to reduce the risk. Not
> enabling
> > >> >> features
> > >> >> >> > >> right
> > >> >> >> > >> >> > away
> > >> >> >> > >> >> > > after installing new software seems like one way to
> do
> > >> >> that. If
> > >> >> >> > we
> > >> >> >> > >> had
> > >> >> >> > >> >> a
> > >> >> >> > >> >> > > feature to automatically upgrade during a roll, I'm
> > not
> > >> >> sure
> > >> >> >> > that I
> > >> >> >> > >> >> would
> > >> >> >> > >> >> > > recommend that people use it, because if something
> > >> fails,
> > >> >> it
> > >> >> >> > makes
> > >> >> >> > >> it
> > >> >> >> > >> >> > > harder to tell if the new feature is at fault, or
> > >> something
> > >> >> >> else
> > >> >> >> > in
> > >> >> >> > >> the
> > >> >> >> > >> >> > new
> > >> >> >> > >> >> > > software.
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > We already tell users to do a "double roll" when
> going
> > >> to
> > >> >> a new
> > >> >> >> > IBP.
> > >> >> >> > >> >> > (Just
> > >> >> >> > >> >> > > to give background to people who haven't heard that
> > >> >> phrase, the
> > >> >> >> > >> first
> > >> >> >> > >> >> > roll
> > >> >> >> > >> >> > > installs the new software, and the second roll
> updates
> > >> the
> > >> >> >> IBP).
> > >> >> >> > So
> > >> >> >> > >> >> this
> > >> >> >> > >> >> > > KIP-778 mechanism is basically very similar to that,
> > >> >> except the
> > >> >> >> > >> second
> > >> >> >> > >> >> > > thing isn't a roll, but just an upgrade command. So
> I
> > >> think
> > >> >> >> this
> > >> >> >> > is
> > >> >> >> > >> >> > > consistent with what we currently do.
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > Also, just like David said, we can always add
> > >> auto-upgrade
> > >> >> >> later
> > >> >> >> > if
> > >> >> >> > >> >> there
> > >> >> >> > >> >> > > is demand...
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > best,
> > >> >> >> > >> >> > > Colin
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <
> > >> j...@confluent.io
> > >> >> >
> > >> >> >> > wrote:
> > >> >> >> > >> >> > > >
> > >> >> >> > >> >> > > >> Hi, David,
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> Thanks for the KIP. A few comments below.
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 10. It would be useful to describe how the
> > controller
> > >> >> node
> > >> >> >> > >> >> determines
> > >> >> >> > >> >> > > the
> > >> >> >> > >> >> > > >> RPC version used to communicate to other
> controller
> > >> >> nodes.
> > >> >> >> > There
> > >> >> >> > >> >> seems
> > >> >> >> > >> >> > > to
> > >> >> >> > >> >> > > >> be a bootstrap problem. A controller node can't
> > read
> > >> >> the log
> > >> >> >> > and
> > >> >> >> > >> >> > > >> therefore the feature level until a quorum leader
> > is
> > >> >> >> elected.
> > >> >> >> > But
> > >> >> >> > >> >> > leader
> > >> >> >> > >> >> > > >> election requires an RPC.
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 11. For downgrades, it would be useful to
> describe
> > >> how
> > >> >> to
> > >> >> >> > >> determine
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >> downgrade process (generating new snapshot,
> > >> propagating
> > >> >> the
> > >> >> >> > >> >> snapshot,
> > >> >> >> > >> >> > > etc)
> > >> >> >> > >> >> > > >> has completed. We could block the UpdateFeature
> > >> request
> > >> >> >> until
> > >> >> >> > the
> > >> >> >> > >> >> > > process
> > >> >> >> > >> >> > > >> is completed. However, since the process could
> take
> > >> >> time,
> > >> >> >> the
> > >> >> >> > >> >> request
> > >> >> >> > >> >> > > could
> > >> >> >> > >> >> > > >> time out. Another way is through DescribeFeature
> > and
> > >> the
> > >> >> >> > server
> > >> >> >> > >> only
> > >> >> >> > >> >> > > >> reports downgraded versions after the process is
> > >> >> completed.
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 12. Since we are changing UpdateFeaturesRequest,
> do
> > >> we
> > >> >> need
> > >> >> >> to
> > >> >> >> > >> >> change
> > >> >> >> > >> >> > > the
> > >> >> >> > >> >> > > >> AdminClient api for updateFeatures too?
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 13. For the paragraph starting with "In the
> absence
> > >> of
> > >> >> an
> > >> >> >> > >> operator
> > >> >> >> > >> >> > > >> defined value for metadata.version", in KIP-584,
> we
> > >> >> >> described
> > >> >> >> > >> how to
> > >> >> >> > >> >> > > >> finalize features with New cluster bootstrap. In
> > that
> > >> >> case,
> > >> >> >> > it's
> > >> >> >> > >> >> > > >> inconvenient for the users to have to run an
> admin
> > >> tool
> > >> >> to
> > >> >> >> > >> finalize
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >> version for each feature. Instead, the system
> > detects
> > >> >> that
> > >> >> >> the
> > >> >> >> > >> >> > /features
> > >> >> >> > >> >> > > >> path is missing in ZK and thus automatically
> > >> finalizes
> > >> >> every
> > >> >> >> > >> feature
> > >> >> >> > >> >> > > with
> > >> >> >> > >> >> > > >> the latest supported version. Could we do
> something
> > >> >> similar
> > >> >> >> in
> > >> >> >> > >> the
> > >> >> >> > >> >> > KRaft
> > >> >> >> > >> >> > > >> mode?
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 14. After the quorum leader generates a new
> > snapshot,
> > >> >> how do
> > >> >> >> > we
> > >> >> >> > >> >> force
> > >> >> >> > >> >> > > >> other nodes to pick up the new snapshot?
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 15. I agree with Jose that it will be useful to
> > >> describe
> > >> >> >> when
> > >> >> >> > >> >> > > generating a
> > >> >> >> > >> >> > > >> new snapshot is needed. To me, it seems the new
> > >> >> snapshot is
> > >> >> >> > only
> > >> >> >> > >> >> > needed
> > >> >> >> > >> >> > > >> when incompatible changes are made.
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> 7. Jose, what control records were you referring?
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> Thanks,
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> Jun
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <
> > >> >> >> > >> davidart...@apache.org
> > >> >> >> > >> >> >
> > >> >> >> > >> >> > > >> wrote:
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > > >>> Jose, thanks for the thorough review and
> comments!
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> I am out of the office until next week, so I
> > >> probably
> > >> >> won't
> > >> >> >> > be
> > >> >> >> > >> able
> > >> >> >> > >> >> > to
> > >> >> >> > >> >> > > >>> update the KIP until then. Here are some replies
> > to
> > >> >> your
> > >> >> >> > >> questions:
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 1. Generate snapshot on upgrade
> > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and sent to
> the
> > >> >> other
> > >> >> >> > nodes
> > >> >> >> > >> >> > > >>> > Why does the Active Controller need to
> generate
> > a
> > >> new
> > >> >> >> > snapshot
> > >> >> >> > >> >> and
> > >> >> >> > >> >> > > >>> > force a snapshot fetch from the replicas
> > (inactive
> > >> >> >> > controller
> > >> >> >> > >> and
> > >> >> >> > >> >> > > >>> > brokers) on an upgrade? Isn't writing the
> > >> >> >> > FeatureLevelRecord
> > >> >> >> > >> good
> > >> >> >> > >> >> > > >>> > enough to communicate the upgrade to the
> > replicas?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> You're right, we don't necessarily need to
> > >> _transmit_ a
> > >> >> >> > >> snapshot,
> > >> >> >> > >> >> > since
> > >> >> >> > >> >> > > >>> each node can generate its own equivalent
> snapshot
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 2. Generate snapshot on downgrade
> > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and sent to
> the
> > >> >> other
> > >> >> >> > >> inactive
> > >> >> >> > >> >> > > >>> > controllers and to brokers (this snapshot may
> be
> > >> >> lossy!)
> > >> >> >> > >> >> > > >>> > Why do we need to send this downgraded
> snapshot
> > to
> > >> >> the
> > >> >> >> > >> brokers?
> > >> >> >> > >> >> The
> > >> >> >> > >> >> > > >>> > replicas have seen the FeatureLevelRecord and
> > >> >> noticed the
> > >> >> >> > >> >> > downgrade.
> > >> >> >> > >> >> > > >>> > Can we have the replicas each independently
> > >> generate
> > >> >> a
> > >> >> >> > >> downgraded
> > >> >> >> > >> >> > > >>> > snapshot at the offset for the downgraded
> > >> >> >> > FeatureLevelRecord?
> > >> >> >> > >> I
> > >> >> >> > >> >> > > assume
> > >> >> >> > >> >> > > >>> > that the active controller will guarantee that
> > all
> > >> >> >> records
> > >> >> >> > >> after
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >>> > FatureLevelRecord use the downgraded version.
> If
> > >> so,
> > >> >> it
> > >> >> >> > would
> > >> >> >> > >> be
> > >> >> >> > >> >> > good
> > >> >> >> > >> >> > > >>> > to mention that explicitly.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Similar to above, yes a broker that detects a
> > >> >> downgrade via
> > >> >> >> > >> >> > > >>> FeatureLevelRecord could generate its own
> > downgrade
> > >> >> >> snapshot
> > >> >> >> > and
> > >> >> >> > >> >> > reload
> > >> >> >> > >> >> > > >>> its
> > >> >> >> > >> >> > > >>> state from that. This does get a little fuzzy
> when
> > >> we
> > >> >> >> > consider
> > >> >> >> > >> >> cases
> > >> >> >> > >> >> > > where
> > >> >> >> > >> >> > > >>> brokers are on different software versions and
> > >> could be
> > >> >> >> > >> generating
> > >> >> >> > >> >> a
> > >> >> >> > >> >> > > >>> downgrade snapshot for version X, but using
> > >> different
> > >> >> >> > versions
> > >> >> >> > >> of
> > >> >> >> > >> >> the
> > >> >> >> > >> >> > > >>> code.
> > >> >> >> > >> >> > > >>> It might be safer to let the controller generate
> > the
> > >> >> >> > snapshot so
> > >> >> >> > >> >> each
> > >> >> >> > >> >> > > >>> broker (regardless of software version) gets the
> > >> same
> > >> >> >> > records.
> > >> >> >> > >> >> > However,
> > >> >> >> > >> >> > > >>> for
> > >> >> >> > >> >> > > >>> upgrades (or downgrades) we expect the whole
> > cluster
> > >> >> to be
> > >> >> >> > >> running
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >>> same
> > >> >> >> > >> >> > > >>> software version before triggering the
> > >> metadata.version
> > >> >> >> > change,
> > >> >> >> > >> so
> > >> >> >> > >> >> > > perhaps
> > >> >> >> > >> >> > > >>> this isn't a likely scenario. Thoughts?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 3. Max metadata version
> > >> >> >> > >> >> > > >>> > >For the first release that supports
> > >> >> metadata.version, we
> > >> >> >> > can
> > >> >> >> > >> >> > simply
> > >> >> >> > >> >> > > >>> > initialize metadata.version with the current
> > (and
> > >> >> only)
> > >> >> >> > >> version.
> > >> >> >> > >> >> > For
> > >> >> >> > >> >> > > >>> future
> > >> >> >> > >> >> > > >>> > releases, we will need a mechanism to
> bootstrap
> > a
> > >> >> >> > particular
> > >> >> >> > >> >> > version.
> > >> >> >> > >> >> > > >>> This
> > >> >> >> > >> >> > > >>> > could be done using the meta.properties file
> or
> > >> some
> > >> >> >> > similar
> > >> >> >> > >> >> > > mechanism.
> > >> >> >> > >> >> > > >>> The
> > >> >> >> > >> >> > > >>> > reason we need the allow for a specific
> initial
> > >> >> version
> > >> >> >> is
> > >> >> >> > to
> > >> >> >> > >> >> > support
> > >> >> >> > >> >> > > >>> the
> > >> >> >> > >> >> > > >>> > use case of starting a Kafka cluster at
> version
> > X
> > >> >> with an
> > >> >> >> > >> older
> > >> >> >> > >> >> > > >>> > metadata.version.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> I assume that the Active Controller will learn
> the
> > >> >> metadata
> > >> >> >> > >> version
> > >> >> >> > >> >> > of
> > >> >> >> > >> >> > > >>> > the broker through the
> > BrokerRegistrationRequest.
> > >> How
> > >> >> >> will
> > >> >> >> > the
> > >> >> >> > >> >> > Active
> > >> >> >> > >> >> > > >>> > Controller learn about the max metadata
> version
> > of
> > >> >> the
> > >> >> >> > >> inactive
> > >> >> >> > >> >> > > >>> > controller nodes? We currently don't send a
> > >> >> registration
> > >> >> >> > >> request
> > >> >> >> > >> >> > from
> > >> >> >> > >> >> > > >>> > the inactive controller to the active
> > controller.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> This came up during the design, but I neglected
> to
> > >> add
> > >> >> it
> > >> >> >> to
> > >> >> >> > the
> > >> >> >> > >> >> KIP.
> > >> >> >> > >> >> > > We
> > >> >> >> > >> >> > > >>> will need a mechanism for determining the
> > supported
> > >> >> >> features
> > >> >> >> > of
> > >> >> >> > >> >> each
> > >> >> >> > >> >> > > >>> controller similar to how brokers use
> > >> >> >> > BrokerRegistrationRequest.
> > >> >> >> > >> >> > > Perhaps
> > >> >> >> > >> >> > > >>> controllers could write a FeatureLevelRecord (or
> > >> >> similar)
> > >> >> >> to
> > >> >> >> > the
> > >> >> >> > >> >> > > metadata
> > >> >> >> > >> >> > > >>> log indicating their supported version. WDYT?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Why do you need to bootstrap a particular
> version?
> > >> >> Isn't
> > >> >> >> the
> > >> >> >> > >> intent
> > >> >> >> > >> >> > > >>> > that the broker will learn the active metadata
> > >> >> version by
> > >> >> >> > >> reading
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >>> > metadata before unfencing?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> This bootstrapping is needed for when a KRaft
> > >> cluster
> > >> >> is
> > >> >> >> > first
> > >> >> >> > >> >> > > started. If
> > >> >> >> > >> >> > > >>> we don't have this mechanism, the cluster can't
> > >> really
> > >> >> do
> > >> >> >> > >> anything
> > >> >> >> > >> >> > > until
> > >> >> >> > >> >> > > >>> the operator finalizes the metadata.version with
> > the
> > >> >> tool.
> > >> >> >> > The
> > >> >> >> > >> >> > > >>> bootstrapping will be done by the controller and
> > the
> > >> >> >> brokers
> > >> >> >> > >> will
> > >> >> >> > >> >> see
> > >> >> >> > >> >> > > this
> > >> >> >> > >> >> > > >>> version as a record (like you say). I'll add
> some
> > >> text
> > >> >> to
> > >> >> >> > >> clarify
> > >> >> >> > >> >> > this.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 4. Reject Registration - This is related to the
> > >> bullet
> > >> >> >> point
> > >> >> >> > >> above.
> > >> >> >> > >> >> > > >>> > What will be the behavior of the active
> > controller
> > >> >> if the
> > >> >> >> > >> broker
> > >> >> >> > >> >> > > sends
> > >> >> >> > >> >> > > >>> > a metadata version that is not compatible with
> > the
> > >> >> >> cluster
> > >> >> >> > >> wide
> > >> >> >> > >> >> > > >>> > metadata version?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> If a broker starts up with a lower supported
> > version
> > >> >> range
> > >> >> >> > than
> > >> >> >> > >> the
> > >> >> >> > >> >> > > >>> current
> > >> >> >> > >> >> > > >>> cluster metadata.version, it should log an error
> > and
> > >> >> >> > shutdown.
> > >> >> >> > >> This
> > >> >> >> > >> >> > is
> > >> >> >> > >> >> > > in
> > >> >> >> > >> >> > > >>> line with KIP-584.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 5. Discover upgrade
> > >> >> >> > >> >> > > >>> > > This snapshot will be a convenient way to
> let
> > >> >> broker
> > >> >> >> and
> > >> >> >> > >> >> > controller
> > >> >> >> > >> >> > > >>> > components rebuild their entire in-memory
> state
> > >> >> following
> > >> >> >> > an
> > >> >> >> > >> >> > upgrade.
> > >> >> >> > >> >> > > >>> > Can we rely on the presence of the
> > >> >> FeatureLevelRecord for
> > >> >> >> > the
> > >> >> >> > >> >> > > metadata
> > >> >> >> > >> >> > > >>> > version for this functionality? If so, it
> avoids
> > >> >> having
> > >> >> >> to
> > >> >> >> > >> reload
> > >> >> >> > >> >> > the
> > >> >> >> > >> >> > > >>> > snapshot.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> For upgrades, yes probably since we won't need
> to
> > >> >> "rewrite"
> > >> >> >> > any
> > >> >> >> > >> >> > > records in
> > >> >> >> > >> >> > > >>> this case. For downgrades, we will need to
> > generate
> > >> the
> > >> >> >> > snapshot
> > >> >> >> > >> >> and
> > >> >> >> > >> >> > > >>> reload
> > >> >> >> > >> >> > > >>> everything.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 6. Metadata version specification
> > >> >> >> > >> >> > > >>> > >  V4(version=4, isBackwardsCompatible=false,
> > >> >> >> > description="New
> > >> >> >> > >> >> > > metadata
> > >> >> >> > >> >> > > >>> > record type Bar"),
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Very cool. Do you have plans to generate Apache
> > >> Kafka
> > >> >> HTML
> > >> >> >> > >> >> > > >>> > documentation for this information? Would be
> > >> helpful
> > >> >> to
> > >> >> >> > >> display
> > >> >> >> > >> >> > this
> > >> >> >> > >> >> > > >>> > information to the user using the
> > >> kafka-features.sh
> > >> >> and
> > >> >> >> > >> feature
> > >> >> >> > >> >> > RPC?
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Hm good idea :) I'll add a brief section on
> > >> >> documentation.
> > >> >> >> > This
> > >> >> >> > >> >> would
> > >> >> >> > >> >> > > >>> certainly be very useful
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> 7.Downgrade records
> > >> >> >> > >> >> > > >>> > I think we should explicitly mention that the
> > >> >> downgrade
> > >> >> >> > >> process
> > >> >> >> > >> >> > will
> > >> >> >> > >> >> > > >>> > downgrade both metadata records and controller
> > >> >> records.
> > >> >> >> In
> > >> >> >> > >> >> KIP-630
> > >> >> >> > >> >> > we
> > >> >> >> > >> >> > > >>> > introduced two control records for snapshots.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Yes, good call. Let me re-read that KIP and
> > include
> > >> >> some
> > >> >> >> > >> details.
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> Thanks again for the comments!
> > >> >> >> > >> >> > > >>> -David
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando
> > García
> > >> >> Sancio
> > >> >> >> > >> >> > > >>> <jsan...@confluent.io.invalid> wrote:
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>> > One more comment.
> > >> >> >> > >> >> > > >>> >
> > >> >> >> > >> >> > > >>> > 7.Downgrade records
> > >> >> >> > >> >> > > >>> > I think we should explicitly mention that the
> > >> >> downgrade
> > >> >> >> > >> process
> > >> >> >> > >> >> > will
> > >> >> >> > >> >> > > >>> > downgrade both metadata records and controller
> > >> >> records.
> > >> >> >> In
> > >> >> >> > >> >> KIP-630
> > >> >> >> > >> >> > we
> > >> >> >> > >> >> > > >>> > introduced two control records for snapshots.
> > >> >> >> > >> >> > > >>> >
> > >> >> >> > >> >> > > >>>
> > >> >> >> > >> >> > > >>
> > >> >> >> > >> >> > >
> > >> >> >> > >> >> >
> > >> >> >> > >> >>
> > >> >> >> > >> >>
> > >> >> >> > >> >> --
> > >> >> >> > >> >> -David
> > >> >> >> > >> >>
> > >> >> >> > >>
> > >> >> >> >
> > >> >> >>
> > >> >>
> > >> >
> > >>
> > >
> > >
> > > --
> > > -David
> > >
> >
> >
> > --
> > -David
> >
>


-- 
-David

Reply via email to