Hi, David,

Thanks for the reply. Just one more minor comment.

30. ./kafka-features.sh upgrade: It seems that the release param is
optional. Could you describe the semantic when release is not specified?

Jun

On Mon, Nov 29, 2021 at 5:06 PM David Arthur
<david.art...@confluent.io.invalid> wrote:

> Jun, I updated the KIP with the "disable" CLI.
>
> For 16, I think you're asking how we can safely introduce the
> initial version of other feature flags in the future. This will probably
> depend on the nature of the feature that the new flag is gating. We can
> probably take a similar approach and say version 1 is backwards compatible
> to some point (possibly an IBP or metadata.version?).
>
> -David
>
> On Fri, Nov 19, 2021 at 3:00 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, David,
> >
> > Thanks for the reply. The new CLI sounds reasonable to me.
> >
> > 16.
> > For case C, choosing the latest version sounds good to me.
> > For case B, for metadata.version, we pick version 1 since it just happens
> > that for metadata.version version 1 is backward compatible. How do we
> make
> > this more general for other features?
> >
> > 21. Do you still plan to change "delete" to "disable" in the CLI?
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Thu, Nov 18, 2021 at 11:50 AM David Arthur
> > <david.art...@confluent.io.invalid> wrote:
> >
> > > Jun,
> > >
> > > The KIP has some changes to the CLI for KIP-584. With Jason's
> suggestion
> > > incorporated, these three commands would look like:
> > >
> > > 1) kafka-features.sh upgrade --release latest
> > > upgrades all known features to their defaults in the latest release
> > >
> > > 2) kafka-features.sh downgrade --release 3.x
> > > downgrade all known features to the default versions from 3.x
> > >
> > > 3) kafka-features.sh describe --release latest
> > > Describe the known features of the latest release
> > >
> > > The --upgrade-all and --downgrade-all are replaced by specific each
> > > feature+version or giving the --release argument. One problem with
> > > --downgrade-all is it's not clear what the target versions should be:
> the
> > > previous version before the last upgrade -- or the lowest supported
> > > version. Since downgrades will be less common, I think it's better to
> > make
> > > the operator be more explicit about the desired downgrade version
> (either
> > > through [--feature X --version Y] or [--release 3.1]). Does that seem
> > > reasonable?
> > >
> > > 16. For case C, I think we will want to always use the latest
> > > metadata.version for new clusters (like we do for IBP). We can always
> > > change what we mean by "default" down the road. Also, this will likely
> > > become a bit of information we include in release and upgrade notes
> with
> > > each release.
> > >
> > > -David
> > >
> > > On Thu, Nov 18, 2021 at 1:14 PM Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Jason, David,
> > > >
> > > > Just to clarify on the interaction with the end user, the design in
> > > KIP-584
> > > > allows one to do the following.
> > > > (1) kafka-features.sh  --upgrade-all --bootstrap-server
> > > > kafka-broker0.prn1:9071 to upgrade all features to the latest version
> > > known
> > > > by the tool. The user doesn't need to know a specific feature
> version.
> > > > (2) kafka-features.sh  --downgrade-all --bootstrap-server
> > > > kafka-broker0.prn1:9071 to downgrade all features to the latest
> version
> > > > known by the tool. The user doesn't need to know a specific feature
> > > > version.
> > > > (3) kafka-features.sh  --describe --bootstrap-server
> > > > kafka-broker0.prn1:9071 to find out the supported version for each
> > > feature.
> > > > This allows the user to upgrade each feature individually.
> > > >
> > > > Most users will be doing (1) and occasionally (2), and won't need to
> > know
> > > > the exact version of each feature.
> > > >
> > > > 16. For case C, what's the default version? Is it always the latest?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Nov 18, 2021 at 8:15 AM David Arthur
> > > > <david.art...@confluent.io.invalid> wrote:
> > > >
> > > > > Colin, thanks for the detailed response. I understand what you're
> > > saying
> > > > > and I agree with your rationale.
> > > > >
> > > > > It seems like we could just initialize their cluster.metadata to 1
> > when
> > > > the
> > > > > > software is upgraded to 3.2.
> > > > > >
> > > > >
> > > > > Concretely, this means the controller would see that there is no
> > > > > FeatureLevelRecord in the log, and so it would bootstrap one.
> > Normally
> > > > for
> > > > > cluster initialization, this would be read from meta.properties,
> but
> > in
> > > > the
> > > > > case of preview clusters we would need to special case it to
> > initialize
> > > > the
> > > > > version to 1.
> > > > >
> > > > > Once the new FeatureLevelRecord has been committed, any nodes
> > (brokers
> > > or
> > > > > controllers) that are running the latest software will react to the
> > new
> > > > > metadata.version. This means we will need to make this initial
> > version
> > > > of 1
> > > > > be backwards compatible to what we have in 3.0 and 3.1 (since some
> > > > brokers
> > > > > will be on the new software and some on older/preview versions)
> > > > >
> > > > > I agree that auto-upgrading preview users from IBP 3.0 to
> > > > metadata.version
> > > > > 1 (equivalent to IBP 3.2) is probably fine.
> > > > >
> > > > > Back to Jun's case B:
> > > > >
> > > > > b. We upgrade from an old version where no metadata.version has
> been
> > > > > > finalized. In this case, it makes sense to leave metadata.version
> > > > > disabled
> > > > > > since we don't know if all brokers have been upgraded.
> > > > >
> > > > >
> > > > > Instead of leaving it uninitialized, we would initialize it to 1
> > which
> > > > > would be backwards compatible to 3.0 and 3.1. This would eliminate
> > the
> > > > need
> > > > > to check a "final IBP" or deal with 3.2+ clusters without a
> > > > > metadata.version set. The downgrade path for 3.2 back to a preview
> > > > release
> > > > > should be fine since we are saying that metadata.version 1 is
> > > compatible
> > > > > with those releases. The FeatureLevelRecord would exist, but it
> would
> > > be
> > > > > ignored (we might need to make sure this actually works in 3.0).
> > > > >
> > > > > For clarity, here are the three workflows of Jun's three cases:
> > > > >
> > > > > A (KRaft 3.2+ to KRaft 3.2+):
> > > > > * User initializes cluster with an explicit metadata.version X, or
> > the
> > > > tool
> > > > > selects the default
> > > > > * After upgrade, cluster stays at version X until operator upgrades
> > to
> > > Y
> > > > >
> > > > > B (KRaft Preview to KRaft 3.2+):
> > > > > * User initializes cluster without metadata.version
> > > > > * After controller leader is upgraded, initializes metadata.version
> > to
> > > 1
> > > > > * After the cluster is upgraded, an operator may further upgrade to
> > > > version
> > > > > Y
> > > > >
> > > > > C (New KRaft 3.2+):
> > > > > * User initializes cluster with an explicit metadata.version X, or
> > the
> > > > tool
> > > > > selects the default
> > > > >
> > > > >
> > > > > This has been mentioned in this thread, but we should explicitly
> call
> > > out
> > > > > that the absence of metadata.version in meta.properties will be
> used
> > > > > to identify that we are in case B. After 3.2, we will require that
> > > > > metadata.version is present in meta.properties for new clusters. If
> > > users
> > > > > omit the --metadata-version flag from kafka-storage.sh, we should
> > fill
> > > > in a
> > > > > default.
> > > > >
> > > > > So how about the following changes/clarifications to the KIP:
> > > > >
> > > > > * Starting with 3.2, metadata.version is required in KRaft, IBP is
> > > > ignored
> > > > > * If meta.properties does not include metadata.version, it
> indicates
> > > this
> > > > > cluster was initialized with a preview release
> > > > > * If upgrading from a preview release to 3.2+, the controller will
> > > > > initialize metadata.version=1
> > > > > * metadata.version=1 is backwards compatible with preview releases
> > > > >
> > > > > -David
> > > > >
> > > > >
> > > > > On Thu, Nov 18, 2021 at 10:02 AM David Arthur <
> > > david.art...@confluent.io
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Jason,
> > > > > >
> > > > > > 1/2. You've got it right. The intention is that metadata.version
> > will
> > > > > gate
> > > > > > both RPCs (like IBP does) as well as metadata records. So, when a
> > > > broker
> > > > > > sees that metadata.version changed, it may start advertising new
> > RPCs
> > > > and
> > > > > > it will need to refresh the ApiVersions it has for other
> brokers. I
> > > can
> > > > > try
> > > > > > to make this more explicit in the KIP.
> > > > > >
> > > > > > 3. Yes, that's the intention of the --dry-run flag. The current
> > > > > > implementation in kafka-features.sh does a dry run on the client
> > > side,
> > > > > but
> > > > > > this KIP pushes the validation down to the controller. This will
> > > allow
> > > > us
> > > > > > to have the context needed to do proper validation
> > > > > >
> > > > > > Re: version number complexity -- yes this has come up in offline
> > > > > > discussions. With just one feature flag to manage, it's not so
> bad,
> > > but
> > > > > > explicitly managing even a few would be a burden. I like your
> > > > suggestion
> > > > > of
> > > > > > the "--release" flag. That could act as a sort of manifest of
> > > versions
> > > > > > (i.e., the defaults from that release). We could also support
> > > something
> > > > > > like "kafka-features upgrade --release latest" to bring
> everything
> > to
> > > > the
> > > > > > highest supported version.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 17, 2021 at 9:36 PM Jason Gustafson
> > > > > <ja...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Colin/David,
> > > > > >>
> > > > > >> > Like David said, basically it boils down to creating a feature
> > > flag
> > > > > for
> > > > > >> the new proposed __consumer_offsets version, or using a new
> > > > > >> IBP/metadata.version for it. Both approaches have pros and cons.
> > > Using
> > > > > an
> > > > > >> IBP/metadata.version bump reduces the size of the testing
> matrix.
> > > But
> > > > > >> using
> > > > > >> a feature flag allows people to avoid any bugs or pain
> associated
> > > with
> > > > > the
> > > > > >> change if they don't care about enabling it. This is basically
> the
> > > > > classic
> > > > > >> "should I use a feature flag or not?" discussion and we need to
> > have
> > > > it
> > > > > on
> > > > > >> a case-by-case basis.
> > > > > >>
> > > > > >> I think most users are not going to care to manage versions for
> a
> > > > bunch
> > > > > of
> > > > > >> different features. The IBP today has many shortcomings, but at
> > > least
> > > > > it's
> > > > > >> tied to a version that users understand (i.e. the release
> > version).
> > > > How
> > > > > >> would users know after upgrading to Kafka 3.1, for example, that
> > > they
> > > > > need
> > > > > >> to upgrade the metadata.version to 3  and offsets.version to 4
> (or
> > > > > >> whatever)? It's a lot of overhead trying to understand all of
> the
> > > > > >> potential
> > > > > >> features and what each upgrade actually means to them. I am
> > > wondering
> > > > if
> > > > > >> we
> > > > > >> could give them something more convenient which is tied to the
> > > release
> > > > > >> version. For example, maybe we could use a command like
> > > > `kafka-features
> > > > > >> upgrade --release 3.1`, which the broker would then translate to
> > an
> > > > > >> upgrade
> > > > > >> to the latest versions of the individual features at the time of
> > the
> > > > 3.1
> > > > > >> release. Basically it's just a static map from release version
> to
> > > > > feature
> > > > > >> versions to make the upgrade process more convenient.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Jason
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Nov 17, 2021 at 6:20 PM Jason Gustafson <
> > ja...@confluent.io
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > A few additional questions:
> > > > > >> >
> > > > > >> > 1. Currently the IBP tells us what version of individual
> > > > inter-broker
> > > > > >> RPCs
> > > > > >> > will be used. I think the plan in this KIP is to use
> ApiVersions
> > > > > request
> > > > > >> > instead to find the highest compatible version (just like
> > > clients).
> > > > > Do I
> > > > > >> > have that right?
> > > > > >> >
> > > > > >> > 2. The following wasn't very clear to me:
> > > > > >> >
> > > > > >> > > Brokers will be able to observe changes to metadata.version
> by
> > > > > >> observing
> > > > > >> > the metadata log, and could then submit a new
> ApiVersionsRequest
> > > to
> > > > > the
> > > > > >> > other Kafka nodes.
> > > > > >> >
> > > > > >> > Is the purpose of submitting new ApiVersions requests to
> update
> > > the
> > > > > >> > features or the RPC versions? Does metadata.version also
> > influence
> > > > the
> > > > > >> > versions that a broker advertises? It would help to have more
> > > detail
> > > > > >> about
> > > > > >> > this.
> > > > > >> >
> > > > > >> > 3. I imagine users will want to know before performing an
> > upgrade
> > > > > >> whether
> > > > > >> > downgrading will be safe. Would the --dry-run flag tell them
> > this?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Jason
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Nov 17, 2021 at 3:55 PM Colin McCabe <
> > cmcc...@apache.org>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> On Wed, Nov 17, 2021, at 11:28, Jason Gustafson wrote:
> > > > > >> >> > Hi David,
> > > > > >> >> >
> > > > > >> >> > Forgive me if this ground has been covered already. Today,
> we
> > > > have
> > > > > a
> > > > > >> few
> > > > > >> >> > other things that we have latched onto the IBP, such as
> > > upgrades
> > > > to
> > > > > >> the
> > > > > >> >> > format of records in __consumer_offsets. I've been assuming
> > > that
> > > > > >> >> > metadata.version is not covering this. Is that right or is
> > > there
> > > > > some
> > > > > >> >> other
> > > > > >> >> > plan to take care of cases like this?
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >> I think metadata.version could cover changes to things like
> > > > > >> >> __consumer_offsets, if people want it to. Or to put it
> another
> > > way,
> > > > > >> that is
> > > > > >> >> out of scope for this KIP.
> > > > > >> >>
> > > > > >> >> Like David said, basically it boils down to creating a
> feature
> > > flag
> > > > > for
> > > > > >> >> the new proposed __consumer_offsets version, or using a new
> > > > > >> >> IBP/metadata.version for it. Both approaches have pros and
> > cons.
> > > > > Using
> > > > > >> an
> > > > > >> >> IBP/metadata.version bump reduces the size of the testing
> > matrix.
> > > > But
> > > > > >> using
> > > > > >> >> a feature flag allows people to avoid any bugs or pain
> > associated
> > > > > with
> > > > > >> the
> > > > > >> >> change if they don't care about enabling it. This is
> basically
> > > the
> > > > > >> classic
> > > > > >> >> "should I use a feature flag or not?" discussion and we need
> to
> > > > have
> > > > > >> it on
> > > > > >> >> a case-by-case basis.
> > > > > >> >>
> > > > > >> >> I think it's worth calling out that having a 1:1 mapping
> > between
> > > > IBP
> > > > > >> >> versions and metadata.versions will result in some
> > > > metadata.versions
> > > > > >> that
> > > > > >> >> "don't do anything" (aka they do the same thing as the
> previous
> > > > > >> >> metadata.version). For example, if we change
> StopReplicaRequest
> > > > > again,
> > > > > >> that
> > > > > >> >> will not affect KRaft mode, but probably would require an IBP
> > > bump
> > > > > and
> > > > > >> >> hence a metadata.version bump. I think that's OK. It's not
> that
> > > > > >> different
> > > > > >> >> from updating your IBP and getting support for ZStandard,
> when
> > > your
> > > > > >> >> deployment doesn't use ZStandard compression.
> > > > > >> >>
> > > > > >> >> best,
> > > > > >> >> Colin
> > > > > >> >>
> > > > > >> >> > Thanks,
> > > > > >> >> > Jason
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Wed, Nov 17, 2021 at 10:17 AM Jun Rao
> > > > <j...@confluent.io.invalid
> > > > > >
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> >> Hi, Colin,
> > > > > >> >> >>
> > > > > >> >> >> Thanks for the reply.
> > > > > >> >> >>
> > > > > >> >> >> For case b, I am not sure that I understand your
> suggestion.
> > > > Does
> > > > > >> "each
> > > > > >> >> >> subsequent level for metadata.version corresponds to an
> IBP
> > > > > version"
> > > > > >> >> mean
> > > > > >> >> >> that we need to keep IBP forever? Could you describe the
> > > upgrade
> > > > > >> >> process in
> > > > > >> >> >> this case?
> > > > > >> >> >>
> > > > > >> >> >> Jun
> > > > > >> >> >>
> > > > > >> >> >> On Tue, Nov 16, 2021 at 3:45 PM Colin McCabe <
> > > > cmcc...@apache.org>
> > > > > >> >> wrote:
> > > > > >> >> >>
> > > > > >> >> >> > On Tue, Nov 16, 2021, at 15:13, Jun Rao wrote:
> > > > > >> >> >> > > Hi, David, Colin,
> > > > > >> >> >> > >
> > > > > >> >> >> > > Thanks for the reply.
> > > > > >> >> >> > >
> > > > > >> >> >> > > 16. Discussed with David offline a bit. We have 3
> cases.
> > > > > >> >> >> > > a. We upgrade from an old version where the
> > > metadata.version
> > > > > has
> > > > > >> >> >> already
> > > > > >> >> >> > > been finalized. In this case it makes sense to stay
> with
> > > > that
> > > > > >> >> feature
> > > > > >> >> >> > > version after the upgrade.
> > > > > >> >> >> >
> > > > > >> >> >> > +1
> > > > > >> >> >> >
> > > > > >> >> >> > > b. We upgrade from an old version where no
> > > metadata.version
> > > > > has
> > > > > >> >> been
> > > > > >> >> >> > > finalized. In this case, it makes sense to leave
> > > > > >> metadata.version
> > > > > >> >> >> > disabled
> > > > > >> >> >> > > since we don't know if all brokers have been upgraded.
> > > > > >> >> >> >
> > > > > >> >> >> > This is the scenario I was hoping to avoid by saying
> that
> > > ALL
> > > > > >> KRaft
> > > > > >> >> >> > clusters have metadata.version of at least 1, and each
> > > > > subsequent
> > > > > >> >> level
> > > > > >> >> >> for
> > > > > >> >> >> > metadata.version corresponds to an IBP version. The
> > existing
> > > > > KRaft
> > > > > >> >> >> clusters
> > > > > >> >> >> > in 3.0 and earlier are preview (not for production) so I
> > > think
> > > > > >> this
> > > > > >> >> >> change
> > > > > >> >> >> > is OK for 3.x (given that it affects only KRaft). Then
> IBP
> > > is
> > > > > >> >> irrelevant
> > > > > >> >> >> > for KRaft clusters (the config is ignored, possibly
> with a
> > > > WARN
> > > > > or
> > > > > >> >> ERROR
> > > > > >> >> >> > message generated if it is set).
> > > > > >> >> >> >
> > > > > >> >> >> > > c. We are starting from a brand new cluster and of
> > course
> > > no
> > > > > >> >> >> > > metadata.version has been finalized. In this case, the
> > KIP
> > > > > says
> > > > > >> it
> > > > > >> >> will
> > > > > >> >> >> > > pick the metadata.version in meta.properties. In the
> > > common
> > > > > >> case,
> > > > > >> >> >> people
> > > > > >> >> >> > > probably won't set the metadata.version in the
> > > > meta.properties
> > > > > >> file
> > > > > >> >> >> > > explicitly. So, it will be useful to put a default
> > > (stable)
> > > > > >> version
> > > > > >> >> >> there
> > > > > >> >> >> > > when the meta.properties.
> > > > > >> >> >> >
> > > > > >> >> >> > Hmm. I was assuming that clusters where the admin didn't
> > > > specify
> > > > > >> any
> > > > > >> >> >> > metadata.version during formatting would get the latest
> > > > > >> >> metadata.version.
> > > > > >> >> >> > Partly, because this is what we do for IBP today. It
> would
> > > be
> > > > > >> good to
> > > > > >> >> >> > clarify this...
> > > > > >> >> >> >
> > > > > >> >> >> > >
> > > > > >> >> >> > > Also, it would be useful to clarify that if a
> > > > > FeatureLevelRecord
> > > > > >> >> exists
> > > > > >> >> >> > for
> > > > > >> >> >> > > metadata.version, the metadata.version in
> > meta.properties
> > > > will
> > > > > >> be
> > > > > >> >> >> > ignored.
> > > > > >> >> >> > >
> > > > > >> >> >> >
> > > > > >> >> >> > Yeah, I agree.
> > > > > >> >> >> >
> > > > > >> >> >> > best,
> > > > > >> >> >> > Colin
> > > > > >> >> >> >
> > > > > >> >> >> > > Thanks,
> > > > > >> >> >> > >
> > > > > >> >> >> > > Jun
> > > > > >> >> >> > >
> > > > > >> >> >> > >
> > > > > >> >> >> > > On Tue, Nov 16, 2021 at 12:39 PM Colin McCabe <
> > > > > >> cmcc...@apache.org>
> > > > > >> >> >> > wrote:
> > > > > >> >> >> > >
> > > > > >> >> >> > >> On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> > > > > >> >> >> > >> > Hi, David,
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> > Thanks for the reply.
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> > 16. My first concern is that the KIP picks up
> > > > meta.version
> > > > > >> >> >> > inconsistently
> > > > > >> >> >> > >> > during the deployment. If a new cluster is started,
> > we
> > > > pick
> > > > > >> up
> > > > > >> >> the
> > > > > >> >> >> > >> highest
> > > > > >> >> >> > >> > version. If we upgrade, we leave the feature
> version
> > > > > >> unchanged.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> Hi Jun,
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> Thanks again for taking a look.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> The proposed behavior in KIP-778 is consistent with
> how
> > > it
> > > > > >> works
> > > > > >> >> >> today.
> > > > > >> >> >> > >> Upgrading the software is distinct from upgrading the
> > > IBP.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> I think it is important to keep these two operations
> > > > > >> ("upgrading
> > > > > >> >> >> > >> IBP/metadata version" and "upgrading software
> version")
> > > > > >> separate.
> > > > > >> >> If
> > > > > >> >> >> > they
> > > > > >> >> >> > >> are coupled it will create a situation where software
> > > > > upgrades
> > > > > >> are
> > > > > >> >> >> > >> difficult and dangerous.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> Consider a situation where you find some bug in your
> > > > current
> > > > > >> >> software,
> > > > > >> >> >> > and
> > > > > >> >> >> > >> you want to upgrade to new software that fixes the
> bug.
> > > If
> > > > > >> >> upgrades
> > > > > >> >> >> and
> > > > > >> >> >> > IBP
> > > > > >> >> >> > >> bumps are coupled, you can't do this without also
> > bumping
> > > > the
> > > > > >> IBP,
> > > > > >> >> >> > which is
> > > > > >> >> >> > >> usually considered a high-risk change. That means
> that
> > > > either
> > > > > >> you
> > > > > >> >> have
> > > > > >> >> >> > to
> > > > > >> >> >> > >> make a special build that includes only the fix
> > > > > (time-consuming
> > > > > >> >> and
> > > > > >> >> >> > >> error-prone), live with the bug for longer, or be
> very
> > > > > >> >> conservative
> > > > > >> >> >> > about
> > > > > >> >> >> > >> ever introducing new IBP/metadata versions. None of
> > those
> > > > are
> > > > > >> >> really
> > > > > >> >> >> > good
> > > > > >> >> >> > >> choices.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> > Intuitively, it seems that independent of how a
> > cluster
> > > > is
> > > > > >> >> deployed,
> > > > > >> >> >> > we
> > > > > >> >> >> > >> > should always pick the same feature version.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> I think it makes sense to draw a distinction between
> > > > > upgrading
> > > > > >> an
> > > > > >> >> >> > existing
> > > > > >> >> >> > >> cluster and deploying a new one. What most people
> want
> > > out
> > > > of
> > > > > >> >> upgrades
> > > > > >> >> >> > is
> > > > > >> >> >> > >> that things should keep working, but with bug fixes.
> If
> > > we
> > > > > >> change
> > > > > >> >> >> that,
> > > > > >> >> >> > it
> > > > > >> >> >> > >> just makes people more reluctant to upgrade (which is
> > > > always
> > > > > a
> > > > > >> >> >> > problem...)
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> > I think we need to think this through in this KIP.
> My
> > > > > second
> > > > > >> >> concern
> > > > > >> >> >> > is
> > > > > >> >> >> > >> > that as a particular version matures, it's
> > inconvenient
> > > > > for a
> > > > > >> >> user
> > > > > >> >> >> to
> > > > > >> >> >> > >> manually
> > > > > >> >> >> > >> > upgrade every feature version. As long as we have a
> > > path
> > > > to
> > > > > >> >> achieve
> > > > > >> >> >> > that
> > > > > >> >> >> > >> in
> > > > > >> >> >> > >> > the future, we don't need to address that in this
> > KIP.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> If people are managing a large number of Kafka
> > clusters,
> > > > they
> > > > > >> will
> > > > > >> >> >> want
> > > > > >> >> >> > to
> > > > > >> >> >> > >> do some sort of A/B testing with IBP/metadata
> versions.
> > > So
> > > > if
> > > > > >> you
> > > > > >> >> have
> > > > > >> >> >> > 1000
> > > > > >> >> >> > >> Kafka clusters, you roll out the new IBP version to
> 10
> > of
> > > > > them
> > > > > >> >> and see
> > > > > >> >> >> > how
> > > > > >> >> >> > >> it goes. If that goes well, you roll it out to more,
> > etc.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> So, the automation needs to be at the cluster
> > management
> > > > > layer,
> > > > > >> >> not at
> > > > > >> >> >> > the
> > > > > >> >> >> > >> Kafka layer. Each Kafka cluster doesn't know how well
> > > > things
> > > > > >> went
> > > > > >> >> in
> > > > > >> >> >> the
> > > > > >> >> >> > >> other 999 clusters. Automatically upgrading is a bad
> > > thing
> > > > > for
> > > > > >> the
> > > > > >> >> >> same
> > > > > >> >> >> > >> reason Kafka automatically upgrading its own software
> > > > version
> > > > > >> >> would
> > > > > >> >> >> be a
> > > > > >> >> >> > >> bad thing -- it could lead to a disruption to a
> > sensitive
> > > > > >> cluster
> > > > > >> >> at
> > > > > >> >> >> the
> > > > > >> >> >> > >> wrong time.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> For people who are just managing one or two Kafka
> > > clusters,
> > > > > >> >> >> > automatically
> > > > > >> >> >> > >> upgrading feature versions isn't a big burden and can
> > be
> > > > done
> > > > > >> >> >> manually.
> > > > > >> >> >> > >> This is all consistent with how IBP works today.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> Also, we already have a command-line option to the
> > > feature
> > > > > tool
> > > > > >> >> which
> > > > > >> >> >> > >> upgrades all features to the latest available, if
> that
> > is
> > > > > what
> > > > > >> the
> > > > > >> >> >> > >> administrator desires. Perhaps we could add
> > documentation
> > > > > >> saying
> > > > > >> >> that
> > > > > >> >> >> > this
> > > > > >> >> >> > >> should be done as the last step of the upgrade.
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> best,
> > > > > >> >> >> > >> Colin
> > > > > >> >> >> > >>
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> > 21. "./kafka-features.sh delete": Deleting a
> feature
> > > > seems
> > > > > a
> > > > > >> bit
> > > > > >> >> >> weird
> > > > > >> >> >> > >> > since the logic is always there. Would it be better
> > to
> > > > use
> > > > > >> >> disable?
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> > Jun
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > > > > >> >> >> > >> > <david.art...@confluent.io.invalid> wrote:
> > > > > >> >> >> > >> >
> > > > > >> >> >> > >> >> Colin and Jun, thanks for the additional comments!
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> Colin:
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > We've been talking about having an automated RPC
> > > > > >> >> compatibility
> > > > > >> >> >> > checker
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> Do we have a way to mark fields in schemas as
> > > > deprecated?
> > > > > It
> > > > > >> >> can
> > > > > >> >> >> > stay in
> > > > > >> >> >> > >> >> the RPC, it just complicates the logic a bit.
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > It would be nice if the active controller could
> > > > validate
> > > > > >> >> that a
> > > > > >> >> >> > >> majority
> > > > > >> >> >> > >> >> of the quorum could use the proposed
> > metadata.version.
> > > > The
> > > > > >> >> active
> > > > > >> >> >> > >> >> controller should have this information, right? If
> > we
> > > > > don't
> > > > > >> >> have
> > > > > >> >> >> > recent
> > > > > >> >> >> > >> >> information  from a quorum of voters, we wouldn't
> be
> > > > > active.
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> I believe we should have this information from the
> > > > > >> >> >> > ApiVersionsResponse.
> > > > > >> >> >> > >> It
> > > > > >> >> >> > >> >> would be good to do this validation to avoid a
> > > situation
> > > > > >> where
> > > > > >> >> a
> > > > > >> >> >> > >> >> quorum leader can't be elected due to
> unprocessable
> > > > > records.
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > Do we need delete as a command separate from
> > > > downgrade?
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> I think from an operator's perspective, it is nice
> > to
> > > > > >> >> distinguish
> > > > > >> >> >> > >> between
> > > > > >> >> >> > >> >> changing a feature flag and unsetting it. It might
> > be
> > > > > >> >> surprising to
> > > > > >> >> >> > an
> > > > > >> >> >> > >> >> operator to see the flag's version set to nothing
> > when
> > > > > they
> > > > > >> >> >> requested
> > > > > >> >> >> > >> the
> > > > > >> >> >> > >> >> downgrade to version 0 (or less).
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > it seems like we should spell out that
> > > > metadata.version
> > > > > >> >> begins at
> > > > > >> >> >> > 1 in
> > > > > >> >> >> > >> >> KRaft clusters
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> I added this text:
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> Introduce an IBP version to indicate the lowest
> > > software
> > > > > >> >> version
> > > > > >> >> >> that
> > > > > >> >> >> > >> >> > supports *metadata.version*. Below this IBP, the
> > > > > >> >> >> > *metadata.version* is
> > > > > >> >> >> > >> >> > undefined and will not be examined. At or above
> > this
> > > > > IBP,
> > > > > >> the
> > > > > >> >> >> > >> >> > *metadata.version* must be *0* for ZooKeeper
> > > clusters
> > > > > and
> > > > > >> >> will be
> > > > > >> >> >> > >> >> > initialized as *1* for KRaft clusters.
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > We probably also want an RPC implemented by both
> > > > brokers
> > > > > >> and
> > > > > >> >> >> > >> controllers
> > > > > >> >> >> > >> >> that will reveal the min and max supported
> versions
> > > for
> > > > > each
> > > > > >> >> >> feature
> > > > > >> >> >> > >> level
> > > > > >> >> >> > >> >> supported by the server
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> This is available in ApiVersionsResponse (we
> include
> > > the
> > > > > >> >> server's
> > > > > >> >> >> > >> supported
> > > > > >> >> >> > >> >> features as well as the cluster's finalized
> > features)
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> --------
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> Jun:
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> 12. I've updated the KIP with AdminClient changes
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> 14. You're right, it looks like I missed a few
> > > sections
> > > > > >> >> regarding
> > > > > >> >> >> > >> snapshot
> > > > > >> >> >> > >> >> generation. I've corrected it
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> 16. This feels more like an enhancement to
> KIP-584.
> > I
> > > > > agree
> > > > > >> it
> > > > > >> >> >> could
> > > > > >> >> >> > be
> > > > > >> >> >> > >> >> useful, but perhaps we could address it separately
> > > from
> > > > > >> KRaft
> > > > > >> >> >> > upgrades?
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> 20. Indeed snapshots are not strictly necessary
> > during
> > > > an
> > > > > >> >> upgrade,
> > > > > >> >> >> > I've
> > > > > >> >> >> > >> >> reworded this
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> Thanks!
> > > > > >> >> >> > >> >> David
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> > > > > >> >> <j...@confluent.io.invalid>
> > > > > >> >> >> > >> wrote:
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> > Hi, David, Jose and Colin,
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > Thanks for the reply. A few more comments.
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > 12. It seems that we haven't updated the
> > AdminClient
> > > > > >> >> accordingly?
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > 14. "Metadata snapshot is generated and sent to
> > the
> > > > > other
> > > > > >> >> >> inactive
> > > > > >> >> >> > >> >> > controllers and to brokers". I thought we wanted
> > > each
> > > > > >> broker
> > > > > >> >> to
> > > > > >> >> >> > >> generate
> > > > > >> >> >> > >> >> > its own snapshot independently? If only the
> > > controller
> > > > > >> >> generates
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> > snapshot, how do we force other brokers to pick
> it
> > > up?
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > 16. If a feature version is new, one may not
> want
> > to
> > > > > >> enable
> > > > > >> >> it
> > > > > >> >> >> > >> >> immediately
> > > > > >> >> >> > >> >> > after the cluster is upgraded. However, if a
> > feature
> > > > > >> version
> > > > > >> >> has
> > > > > >> >> >> > been
> > > > > >> >> >> > >> >> > stable, requiring every user to run a command to
> > > > upgrade
> > > > > >> to
> > > > > >> >> that
> > > > > >> >> >> > >> version
> > > > > >> >> >> > >> >> > seems inconvenient. One way to improve this is
> for
> > > > each
> > > > > >> >> feature
> > > > > >> >> >> to
> > > > > >> >> >> > >> define
> > > > > >> >> >> > >> >> > one version as the default. Then, when we
> upgrade
> > a
> > > > > >> cluster,
> > > > > >> >> we
> > > > > >> >> >> > will
> > > > > >> >> >> > >> >> > automatically upgrade the feature to the default
> > > > > version.
> > > > > >> An
> > > > > >> >> >> admin
> > > > > >> >> >> > >> could
> > > > > >> >> >> > >> >> > use the tool to upgrade to a version higher than
> > the
> > > > > >> default.
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > 20. "The quorum controller can assist with this
> > > > process
> > > > > by
> > > > > >> >> >> > generating
> > > > > >> >> >> > >> a
> > > > > >> >> >> > >> >> > metadata snapshot after a metadata.version
> > increase
> > > > has
> > > > > >> been
> > > > > >> >> >> > >> committed to
> > > > > >> >> >> > >> >> > the metadata log. This snapshot will be a
> > convenient
> > > > way
> > > > > >> to
> > > > > >> >> let
> > > > > >> >> >> > broker
> > > > > >> >> >> > >> >> and
> > > > > >> >> >> > >> >> > controller components rebuild their entire
> > in-memory
> > > > > state
> > > > > >> >> >> > following
> > > > > >> >> >> > >> an
> > > > > >> >> >> > >> >> > upgrade." The new version of the software could
> > read
> > > > > both
> > > > > >> >> the new
> > > > > >> >> >> > and
> > > > > >> >> >> > >> the
> > > > > >> >> >> > >> >> > old version. Is generating a new snapshot during
> > > > upgrade
> > > > > >> >> needed?
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > Jun
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <
> > > > > >> >> cmcc...@apache.org>
> > > > > >> >> >> > >> wrote:
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
> > > > > >> >> >> > >> >> > > > Hi, David,
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > > One more comment.
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > > 16. The main reason why KIP-584 requires
> > > > finalizing
> > > > > a
> > > > > >> >> feature
> > > > > >> >> >> > >> >> manually
> > > > > >> >> >> > >> >> > is
> > > > > >> >> >> > >> >> > > > that in the ZK world, the controller doesn't
> > > know
> > > > > all
> > > > > >> >> brokers
> > > > > >> >> >> > in a
> > > > > >> >> >> > >> >> > > cluster.
> > > > > >> >> >> > >> >> > > > A broker temporarily down is not registered
> in
> > > ZK.
> > > > > in
> > > > > >> the
> > > > > >> >> >> KRaft
> > > > > >> >> >> > >> >> world,
> > > > > >> >> >> > >> >> > > the
> > > > > >> >> >> > >> >> > > > controller keeps track of all brokers,
> > including
> > > > > those
> > > > > >> >> that
> > > > > >> >> >> are
> > > > > >> >> >> > >> >> > > temporarily
> > > > > >> >> >> > >> >> > > > down. This makes it possible for the
> > controller
> > > to
> > > > > >> >> >> > automatically
> > > > > >> >> >> > >> >> > > finalize a
> > > > > >> >> >> > >> >> > > > feature---it's safe to do so when all
> brokers
> > > > > support
> > > > > >> >> that
> > > > > >> >> >> > >> feature.
> > > > > >> >> >> > >> >> > This
> > > > > >> >> >> > >> >> > > > will make the upgrade process much simpler
> > since
> > > > no
> > > > > >> >> manual
> > > > > >> >> >> > >> command is
> > > > > >> >> >> > >> >> > > > required to turn on a new feature. Have we
> > > > > considered
> > > > > >> >> this?
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > > Thanks,
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > > Jun
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > Hi Jun,
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > I guess David commented on this point already,
> > but
> > > > > I'll
> > > > > >> >> comment
> > > > > >> >> >> > as
> > > > > >> >> >> > >> >> well.
> > > > > >> >> >> > >> >> > I
> > > > > >> >> >> > >> >> > > always had the perception that users viewed
> > rolls
> > > as
> > > > > >> >> >> potentially
> > > > > >> >> >> > >> risky
> > > > > >> >> >> > >> >> > and
> > > > > >> >> >> > >> >> > > were looking for ways to reduce the risk. Not
> > > > enabling
> > > > > >> >> features
> > > > > >> >> >> > >> right
> > > > > >> >> >> > >> >> > away
> > > > > >> >> >> > >> >> > > after installing new software seems like one
> way
> > > to
> > > > do
> > > > > >> >> that. If
> > > > > >> >> >> > we
> > > > > >> >> >> > >> had
> > > > > >> >> >> > >> >> a
> > > > > >> >> >> > >> >> > > feature to automatically upgrade during a
> roll,
> > > I'm
> > > > > not
> > > > > >> >> sure
> > > > > >> >> >> > that I
> > > > > >> >> >> > >> >> would
> > > > > >> >> >> > >> >> > > recommend that people use it, because if
> > something
> > > > > >> fails,
> > > > > >> >> it
> > > > > >> >> >> > makes
> > > > > >> >> >> > >> it
> > > > > >> >> >> > >> >> > > harder to tell if the new feature is at fault,
> > or
> > > > > >> something
> > > > > >> >> >> else
> > > > > >> >> >> > in
> > > > > >> >> >> > >> the
> > > > > >> >> >> > >> >> > new
> > > > > >> >> >> > >> >> > > software.
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > We already tell users to do a "double roll"
> when
> > > > going
> > > > > >> to
> > > > > >> >> a new
> > > > > >> >> >> > IBP.
> > > > > >> >> >> > >> >> > (Just
> > > > > >> >> >> > >> >> > > to give background to people who haven't heard
> > > that
> > > > > >> >> phrase, the
> > > > > >> >> >> > >> first
> > > > > >> >> >> > >> >> > roll
> > > > > >> >> >> > >> >> > > installs the new software, and the second roll
> > > > updates
> > > > > >> the
> > > > > >> >> >> IBP).
> > > > > >> >> >> > So
> > > > > >> >> >> > >> >> this
> > > > > >> >> >> > >> >> > > KIP-778 mechanism is basically very similar to
> > > that,
> > > > > >> >> except the
> > > > > >> >> >> > >> second
> > > > > >> >> >> > >> >> > > thing isn't a roll, but just an upgrade
> command.
> > > So
> > > > I
> > > > > >> think
> > > > > >> >> >> this
> > > > > >> >> >> > is
> > > > > >> >> >> > >> >> > > consistent with what we currently do.
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > Also, just like David said, we can always add
> > > > > >> auto-upgrade
> > > > > >> >> >> later
> > > > > >> >> >> > if
> > > > > >> >> >> > >> >> there
> > > > > >> >> >> > >> >> > > is demand...
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > best,
> > > > > >> >> >> > >> >> > > Colin
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <
> > > > > >> j...@confluent.io
> > > > > >> >> >
> > > > > >> >> >> > wrote:
> > > > > >> >> >> > >> >> > > >
> > > > > >> >> >> > >> >> > > >> Hi, David,
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> Thanks for the KIP. A few comments below.
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 10. It would be useful to describe how the
> > > > > controller
> > > > > >> >> node
> > > > > >> >> >> > >> >> determines
> > > > > >> >> >> > >> >> > > the
> > > > > >> >> >> > >> >> > > >> RPC version used to communicate to other
> > > > controller
> > > > > >> >> nodes.
> > > > > >> >> >> > There
> > > > > >> >> >> > >> >> seems
> > > > > >> >> >> > >> >> > > to
> > > > > >> >> >> > >> >> > > >> be a bootstrap problem. A controller node
> > can't
> > > > > read
> > > > > >> >> the log
> > > > > >> >> >> > and
> > > > > >> >> >> > >> >> > > >> therefore the feature level until a quorum
> > > leader
> > > > > is
> > > > > >> >> >> elected.
> > > > > >> >> >> > But
> > > > > >> >> >> > >> >> > leader
> > > > > >> >> >> > >> >> > > >> election requires an RPC.
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 11. For downgrades, it would be useful to
> > > > describe
> > > > > >> how
> > > > > >> >> to
> > > > > >> >> >> > >> determine
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >> downgrade process (generating new snapshot,
> > > > > >> propagating
> > > > > >> >> the
> > > > > >> >> >> > >> >> snapshot,
> > > > > >> >> >> > >> >> > > etc)
> > > > > >> >> >> > >> >> > > >> has completed. We could block the
> > UpdateFeature
> > > > > >> request
> > > > > >> >> >> until
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> > > process
> > > > > >> >> >> > >> >> > > >> is completed. However, since the process
> > could
> > > > take
> > > > > >> >> time,
> > > > > >> >> >> the
> > > > > >> >> >> > >> >> request
> > > > > >> >> >> > >> >> > > could
> > > > > >> >> >> > >> >> > > >> time out. Another way is through
> > > DescribeFeature
> > > > > and
> > > > > >> the
> > > > > >> >> >> > server
> > > > > >> >> >> > >> only
> > > > > >> >> >> > >> >> > > >> reports downgraded versions after the
> process
> > > is
> > > > > >> >> completed.
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 12. Since we are changing
> > > UpdateFeaturesRequest,
> > > > do
> > > > > >> we
> > > > > >> >> need
> > > > > >> >> >> to
> > > > > >> >> >> > >> >> change
> > > > > >> >> >> > >> >> > > the
> > > > > >> >> >> > >> >> > > >> AdminClient api for updateFeatures too?
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 13. For the paragraph starting with "In the
> > > > absence
> > > > > >> of
> > > > > >> >> an
> > > > > >> >> >> > >> operator
> > > > > >> >> >> > >> >> > > >> defined value for metadata.version", in
> > > KIP-584,
> > > > we
> > > > > >> >> >> described
> > > > > >> >> >> > >> how to
> > > > > >> >> >> > >> >> > > >> finalize features with New cluster
> bootstrap.
> > > In
> > > > > that
> > > > > >> >> case,
> > > > > >> >> >> > it's
> > > > > >> >> >> > >> >> > > >> inconvenient for the users to have to run
> an
> > > > admin
> > > > > >> tool
> > > > > >> >> to
> > > > > >> >> >> > >> finalize
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >> version for each feature. Instead, the
> system
> > > > > detects
> > > > > >> >> that
> > > > > >> >> >> the
> > > > > >> >> >> > >> >> > /features
> > > > > >> >> >> > >> >> > > >> path is missing in ZK and thus
> automatically
> > > > > >> finalizes
> > > > > >> >> every
> > > > > >> >> >> > >> feature
> > > > > >> >> >> > >> >> > > with
> > > > > >> >> >> > >> >> > > >> the latest supported version. Could we do
> > > > something
> > > > > >> >> similar
> > > > > >> >> >> in
> > > > > >> >> >> > >> the
> > > > > >> >> >> > >> >> > KRaft
> > > > > >> >> >> > >> >> > > >> mode?
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 14. After the quorum leader generates a new
> > > > > snapshot,
> > > > > >> >> how do
> > > > > >> >> >> > we
> > > > > >> >> >> > >> >> force
> > > > > >> >> >> > >> >> > > >> other nodes to pick up the new snapshot?
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 15. I agree with Jose that it will be
> useful
> > to
> > > > > >> describe
> > > > > >> >> >> when
> > > > > >> >> >> > >> >> > > generating a
> > > > > >> >> >> > >> >> > > >> new snapshot is needed. To me, it seems the
> > new
> > > > > >> >> snapshot is
> > > > > >> >> >> > only
> > > > > >> >> >> > >> >> > needed
> > > > > >> >> >> > >> >> > > >> when incompatible changes are made.
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> 7. Jose, what control records were you
> > > referring?
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> Thanks,
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> Jun
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >> On Tue, Oct 5, 2021 at 8:53 AM David
> Arthur <
> > > > > >> >> >> > >> davidart...@apache.org
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >> > > >> wrote:
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > > >>> Jose, thanks for the thorough review and
> > > > comments!
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> I am out of the office until next week,
> so I
> > > > > >> probably
> > > > > >> >> won't
> > > > > >> >> >> > be
> > > > > >> >> >> > >> able
> > > > > >> >> >> > >> >> > to
> > > > > >> >> >> > >> >> > > >>> update the KIP until then. Here are some
> > > replies
> > > > > to
> > > > > >> >> your
> > > > > >> >> >> > >> questions:
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 1. Generate snapshot on upgrade
> > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and
> sent
> > to
> > > > the
> > > > > >> >> other
> > > > > >> >> >> > nodes
> > > > > >> >> >> > >> >> > > >>> > Why does the Active Controller need to
> > > > generate
> > > > > a
> > > > > >> new
> > > > > >> >> >> > snapshot
> > > > > >> >> >> > >> >> and
> > > > > >> >> >> > >> >> > > >>> > force a snapshot fetch from the replicas
> > > > > (inactive
> > > > > >> >> >> > controller
> > > > > >> >> >> > >> and
> > > > > >> >> >> > >> >> > > >>> > brokers) on an upgrade? Isn't writing
> the
> > > > > >> >> >> > FeatureLevelRecord
> > > > > >> >> >> > >> good
> > > > > >> >> >> > >> >> > > >>> > enough to communicate the upgrade to the
> > > > > replicas?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> You're right, we don't necessarily need to
> > > > > >> _transmit_ a
> > > > > >> >> >> > >> snapshot,
> > > > > >> >> >> > >> >> > since
> > > > > >> >> >> > >> >> > > >>> each node can generate its own equivalent
> > > > snapshot
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 2. Generate snapshot on downgrade
> > > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and
> sent
> > to
> > > > the
> > > > > >> >> other
> > > > > >> >> >> > >> inactive
> > > > > >> >> >> > >> >> > > >>> > controllers and to brokers (this
> snapshot
> > > may
> > > > be
> > > > > >> >> lossy!)
> > > > > >> >> >> > >> >> > > >>> > Why do we need to send this downgraded
> > > > snapshot
> > > > > to
> > > > > >> >> the
> > > > > >> >> >> > >> brokers?
> > > > > >> >> >> > >> >> The
> > > > > >> >> >> > >> >> > > >>> > replicas have seen the
> FeatureLevelRecord
> > > and
> > > > > >> >> noticed the
> > > > > >> >> >> > >> >> > downgrade.
> > > > > >> >> >> > >> >> > > >>> > Can we have the replicas each
> > independently
> > > > > >> generate
> > > > > >> >> a
> > > > > >> >> >> > >> downgraded
> > > > > >> >> >> > >> >> > > >>> > snapshot at the offset for the
> downgraded
> > > > > >> >> >> > FeatureLevelRecord?
> > > > > >> >> >> > >> I
> > > > > >> >> >> > >> >> > > assume
> > > > > >> >> >> > >> >> > > >>> > that the active controller will
> guarantee
> > > that
> > > > > all
> > > > > >> >> >> records
> > > > > >> >> >> > >> after
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >>> > FatureLevelRecord use the downgraded
> > > version.
> > > > If
> > > > > >> so,
> > > > > >> >> it
> > > > > >> >> >> > would
> > > > > >> >> >> > >> be
> > > > > >> >> >> > >> >> > good
> > > > > >> >> >> > >> >> > > >>> > to mention that explicitly.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Similar to above, yes a broker that
> detects
> > a
> > > > > >> >> downgrade via
> > > > > >> >> >> > >> >> > > >>> FeatureLevelRecord could generate its own
> > > > > downgrade
> > > > > >> >> >> snapshot
> > > > > >> >> >> > and
> > > > > >> >> >> > >> >> > reload
> > > > > >> >> >> > >> >> > > >>> its
> > > > > >> >> >> > >> >> > > >>> state from that. This does get a little
> > fuzzy
> > > > when
> > > > > >> we
> > > > > >> >> >> > consider
> > > > > >> >> >> > >> >> cases
> > > > > >> >> >> > >> >> > > where
> > > > > >> >> >> > >> >> > > >>> brokers are on different software versions
> > and
> > > > > >> could be
> > > > > >> >> >> > >> generating
> > > > > >> >> >> > >> >> a
> > > > > >> >> >> > >> >> > > >>> downgrade snapshot for version X, but
> using
> > > > > >> different
> > > > > >> >> >> > versions
> > > > > >> >> >> > >> of
> > > > > >> >> >> > >> >> the
> > > > > >> >> >> > >> >> > > >>> code.
> > > > > >> >> >> > >> >> > > >>> It might be safer to let the controller
> > > generate
> > > > > the
> > > > > >> >> >> > snapshot so
> > > > > >> >> >> > >> >> each
> > > > > >> >> >> > >> >> > > >>> broker (regardless of software version)
> gets
> > > the
> > > > > >> same
> > > > > >> >> >> > records.
> > > > > >> >> >> > >> >> > However,
> > > > > >> >> >> > >> >> > > >>> for
> > > > > >> >> >> > >> >> > > >>> upgrades (or downgrades) we expect the
> whole
> > > > > cluster
> > > > > >> >> to be
> > > > > >> >> >> > >> running
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >>> same
> > > > > >> >> >> > >> >> > > >>> software version before triggering the
> > > > > >> metadata.version
> > > > > >> >> >> > change,
> > > > > >> >> >> > >> so
> > > > > >> >> >> > >> >> > > perhaps
> > > > > >> >> >> > >> >> > > >>> this isn't a likely scenario. Thoughts?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 3. Max metadata version
> > > > > >> >> >> > >> >> > > >>> > >For the first release that supports
> > > > > >> >> metadata.version, we
> > > > > >> >> >> > can
> > > > > >> >> >> > >> >> > simply
> > > > > >> >> >> > >> >> > > >>> > initialize metadata.version with the
> > current
> > > > > (and
> > > > > >> >> only)
> > > > > >> >> >> > >> version.
> > > > > >> >> >> > >> >> > For
> > > > > >> >> >> > >> >> > > >>> future
> > > > > >> >> >> > >> >> > > >>> > releases, we will need a mechanism to
> > > > bootstrap
> > > > > a
> > > > > >> >> >> > particular
> > > > > >> >> >> > >> >> > version.
> > > > > >> >> >> > >> >> > > >>> This
> > > > > >> >> >> > >> >> > > >>> > could be done using the meta.properties
> > file
> > > > or
> > > > > >> some
> > > > > >> >> >> > similar
> > > > > >> >> >> > >> >> > > mechanism.
> > > > > >> >> >> > >> >> > > >>> The
> > > > > >> >> >> > >> >> > > >>> > reason we need the allow for a specific
> > > > initial
> > > > > >> >> version
> > > > > >> >> >> is
> > > > > >> >> >> > to
> > > > > >> >> >> > >> >> > support
> > > > > >> >> >> > >> >> > > >>> the
> > > > > >> >> >> > >> >> > > >>> > use case of starting a Kafka cluster at
> > > > version
> > > > > X
> > > > > >> >> with an
> > > > > >> >> >> > >> older
> > > > > >> >> >> > >> >> > > >>> > metadata.version.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> I assume that the Active Controller will
> > learn
> > > > the
> > > > > >> >> metadata
> > > > > >> >> >> > >> version
> > > > > >> >> >> > >> >> > of
> > > > > >> >> >> > >> >> > > >>> > the broker through the
> > > > > BrokerRegistrationRequest.
> > > > > >> How
> > > > > >> >> >> will
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> > Active
> > > > > >> >> >> > >> >> > > >>> > Controller learn about the max metadata
> > > > version
> > > > > of
> > > > > >> >> the
> > > > > >> >> >> > >> inactive
> > > > > >> >> >> > >> >> > > >>> > controller nodes? We currently don't
> send
> > a
> > > > > >> >> registration
> > > > > >> >> >> > >> request
> > > > > >> >> >> > >> >> > from
> > > > > >> >> >> > >> >> > > >>> > the inactive controller to the active
> > > > > controller.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> This came up during the design, but I
> > > neglected
> > > > to
> > > > > >> add
> > > > > >> >> it
> > > > > >> >> >> to
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> KIP.
> > > > > >> >> >> > >> >> > > We
> > > > > >> >> >> > >> >> > > >>> will need a mechanism for determining the
> > > > > supported
> > > > > >> >> >> features
> > > > > >> >> >> > of
> > > > > >> >> >> > >> >> each
> > > > > >> >> >> > >> >> > > >>> controller similar to how brokers use
> > > > > >> >> >> > BrokerRegistrationRequest.
> > > > > >> >> >> > >> >> > > Perhaps
> > > > > >> >> >> > >> >> > > >>> controllers could write a
> FeatureLevelRecord
> > > (or
> > > > > >> >> similar)
> > > > > >> >> >> to
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> > > metadata
> > > > > >> >> >> > >> >> > > >>> log indicating their supported version.
> > WDYT?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Why do you need to bootstrap a particular
> > > > version?
> > > > > >> >> Isn't
> > > > > >> >> >> the
> > > > > >> >> >> > >> intent
> > > > > >> >> >> > >> >> > > >>> > that the broker will learn the active
> > > metadata
> > > > > >> >> version by
> > > > > >> >> >> > >> reading
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >>> > metadata before unfencing?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> This bootstrapping is needed for when a
> > KRaft
> > > > > >> cluster
> > > > > >> >> is
> > > > > >> >> >> > first
> > > > > >> >> >> > >> >> > > started. If
> > > > > >> >> >> > >> >> > > >>> we don't have this mechanism, the cluster
> > > can't
> > > > > >> really
> > > > > >> >> do
> > > > > >> >> >> > >> anything
> > > > > >> >> >> > >> >> > > until
> > > > > >> >> >> > >> >> > > >>> the operator finalizes the
> metadata.version
> > > with
> > > > > the
> > > > > >> >> tool.
> > > > > >> >> >> > The
> > > > > >> >> >> > >> >> > > >>> bootstrapping will be done by the
> controller
> > > and
> > > > > the
> > > > > >> >> >> brokers
> > > > > >> >> >> > >> will
> > > > > >> >> >> > >> >> see
> > > > > >> >> >> > >> >> > > this
> > > > > >> >> >> > >> >> > > >>> version as a record (like you say). I'll
> add
> > > > some
> > > > > >> text
> > > > > >> >> to
> > > > > >> >> >> > >> clarify
> > > > > >> >> >> > >> >> > this.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 4. Reject Registration - This is related
> to
> > > the
> > > > > >> bullet
> > > > > >> >> >> point
> > > > > >> >> >> > >> above.
> > > > > >> >> >> > >> >> > > >>> > What will be the behavior of the active
> > > > > controller
> > > > > >> >> if the
> > > > > >> >> >> > >> broker
> > > > > >> >> >> > >> >> > > sends
> > > > > >> >> >> > >> >> > > >>> > a metadata version that is not
> compatible
> > > with
> > > > > the
> > > > > >> >> >> cluster
> > > > > >> >> >> > >> wide
> > > > > >> >> >> > >> >> > > >>> > metadata version?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> If a broker starts up with a lower
> supported
> > > > > version
> > > > > >> >> range
> > > > > >> >> >> > than
> > > > > >> >> >> > >> the
> > > > > >> >> >> > >> >> > > >>> current
> > > > > >> >> >> > >> >> > > >>> cluster metadata.version, it should log an
> > > error
> > > > > and
> > > > > >> >> >> > shutdown.
> > > > > >> >> >> > >> This
> > > > > >> >> >> > >> >> > is
> > > > > >> >> >> > >> >> > > in
> > > > > >> >> >> > >> >> > > >>> line with KIP-584.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 5. Discover upgrade
> > > > > >> >> >> > >> >> > > >>> > > This snapshot will be a convenient way
> > to
> > > > let
> > > > > >> >> broker
> > > > > >> >> >> and
> > > > > >> >> >> > >> >> > controller
> > > > > >> >> >> > >> >> > > >>> > components rebuild their entire
> in-memory
> > > > state
> > > > > >> >> following
> > > > > >> >> >> > an
> > > > > >> >> >> > >> >> > upgrade.
> > > > > >> >> >> > >> >> > > >>> > Can we rely on the presence of the
> > > > > >> >> FeatureLevelRecord for
> > > > > >> >> >> > the
> > > > > >> >> >> > >> >> > > metadata
> > > > > >> >> >> > >> >> > > >>> > version for this functionality? If so,
> it
> > > > avoids
> > > > > >> >> having
> > > > > >> >> >> to
> > > > > >> >> >> > >> reload
> > > > > >> >> >> > >> >> > the
> > > > > >> >> >> > >> >> > > >>> > snapshot.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> For upgrades, yes probably since we won't
> > need
> > > > to
> > > > > >> >> "rewrite"
> > > > > >> >> >> > any
> > > > > >> >> >> > >> >> > > records in
> > > > > >> >> >> > >> >> > > >>> this case. For downgrades, we will need to
> > > > > generate
> > > > > >> the
> > > > > >> >> >> > snapshot
> > > > > >> >> >> > >> >> and
> > > > > >> >> >> > >> >> > > >>> reload
> > > > > >> >> >> > >> >> > > >>> everything.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 6. Metadata version specification
> > > > > >> >> >> > >> >> > > >>> > >  V4(version=4,
> > > isBackwardsCompatible=false,
> > > > > >> >> >> > description="New
> > > > > >> >> >> > >> >> > > metadata
> > > > > >> >> >> > >> >> > > >>> > record type Bar"),
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Very cool. Do you have plans to generate
> > > Apache
> > > > > >> Kafka
> > > > > >> >> HTML
> > > > > >> >> >> > >> >> > > >>> > documentation for this information?
> Would
> > be
> > > > > >> helpful
> > > > > >> >> to
> > > > > >> >> >> > >> display
> > > > > >> >> >> > >> >> > this
> > > > > >> >> >> > >> >> > > >>> > information to the user using the
> > > > > >> kafka-features.sh
> > > > > >> >> and
> > > > > >> >> >> > >> feature
> > > > > >> >> >> > >> >> > RPC?
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Hm good idea :) I'll add a brief section
> on
> > > > > >> >> documentation.
> > > > > >> >> >> > This
> > > > > >> >> >> > >> >> would
> > > > > >> >> >> > >> >> > > >>> certainly be very useful
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> 7.Downgrade records
> > > > > >> >> >> > >> >> > > >>> > I think we should explicitly mention
> that
> > > the
> > > > > >> >> downgrade
> > > > > >> >> >> > >> process
> > > > > >> >> >> > >> >> > will
> > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records and
> > > controller
> > > > > >> >> records.
> > > > > >> >> >> In
> > > > > >> >> >> > >> >> KIP-630
> > > > > >> >> >> > >> >> > we
> > > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > > snapshots.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Yes, good call. Let me re-read that KIP
> and
> > > > > include
> > > > > >> >> some
> > > > > >> >> >> > >> details.
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> Thanks again for the comments!
> > > > > >> >> >> > >> >> > > >>> -David
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> On Wed, Sep 29, 2021 at 5:09 PM José
> Armando
> > > > > García
> > > > > >> >> Sancio
> > > > > >> >> >> > >> >> > > >>> <jsan...@confluent.io.invalid> wrote:
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>> > One more comment.
> > > > > >> >> >> > >> >> > > >>> >
> > > > > >> >> >> > >> >> > > >>> > 7.Downgrade records
> > > > > >> >> >> > >> >> > > >>> > I think we should explicitly mention
> that
> > > the
> > > > > >> >> downgrade
> > > > > >> >> >> > >> process
> > > > > >> >> >> > >> >> > will
> > > > > >> >> >> > >> >> > > >>> > downgrade both metadata records and
> > > controller
> > > > > >> >> records.
> > > > > >> >> >> In
> > > > > >> >> >> > >> >> KIP-630
> > > > > >> >> >> > >> >> > we
> > > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > > snapshots.
> > > > > >> >> >> > >> >> > > >>> >
> > > > > >> >> >> > >> >> > > >>>
> > > > > >> >> >> > >> >> > > >>
> > > > > >> >> >> > >> >> > >
> > > > > >> >> >> > >> >> >
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >> >> --
> > > > > >> >> >> > >> >> -David
> > > > > >> >> >> > >> >>
> > > > > >> >> >> > >>
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -David
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -David
> > > > >
> > > >
> > >
> > >
> > > --
> > > -David
> > >
> >
>
>
> --
> -David
>

Reply via email to