Jun and Colin, thanks very much for the comments. See below

10. Colin, I agree that ApiVersionsRequest|Response is the most
straightforward approach here

11. This brings up an interesting point. Once a UpdateFeature request is
finished processing, a subsequent "describe features" (ApiVersionsRequest)
made by the admin client will go to a random broker, and so might not
reflect the feature update. We could add blocking behavior to the admin
client so it would polls ApiVersions for some time until the expected
version is reflected on that broker. However, this does not mean _every_
broker has finished upgrading/downgrading -- just the one handling that
request. Maybe we could have the admin client poll all the brokers until
the expected version is seen.

If at a later time a broker comes online that needs to process an
upgrade/downgrade, I don't think there will be a problem since the broker
will be fenced until it catches up on latest metadata (which will include
the feature level).

12. Yes, we will need changes to the admin client for "updateFeatures".
I'll update the KIP to reflect this.

13. I'll expand the paragraph on the initial "metadata.version" into its
own section and add some detail.

14/15. As mentioned, I think we can avoid this and rely on
brokers/controllers generating their own snapshots. We will probably want
some kind of manual recovery mode where we can force load a snapshot, but
that is out of scope here (I think..)

16. Automatic upgrades should be feasible, but I think we will want to
start with manual upgrades (while we work out the design and fix bugs).
Following the design detailed in this KIP, we could have a controller
component that (as you suggest) automatically finalizes the feature to the
max of all broker supported versions. I can include a section on this or we
could defer to a future KIP. WDYT?

-David


On Tue, Oct 12, 2021 at 1:57 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Thu, Oct 7, 2021, at 17:19, Jun Rao wrote:
> > Hi, David,
> >
> > Thanks for the KIP. A few comments below.
> >
> > 10. It would be useful to describe how the controller node determines the
> > RPC version used to communicate to other controller nodes. There seems to
> > be a bootstrap problem. A controller node can't read the log and
> > therefore the feature level until a quorum leader is elected. But leader
> > election requires an RPC.
> >
>
> Hi Jun,
>
> I agree that we need to figure this out. I think it would be best to
> simply use ApiVersionsRequest and ApiVersionsResponse. That way each
> controller can use the appropriate RPC versions when communicating with
> each other controller. This would allow us to upgrade them one by one.
>
> > 11. For downgrades, it would be useful to describe how to determine the
> > downgrade process (generating new snapshot, propagating the snapshot,
> etc)
> > has completed. We could block the UpdateFeature request until the process
> > is completed. However, since the process could take time, the request
> could
> > time out. Another way is through DescribeFeature and the server only
> > reports downgraded versions after the process is completed.
>
> Hmm.. I think we need to avoid blocking, since we don't know how long it
> will take for all nodes to act on the downgrade request. After all, some
> nodes may be down.
>
> But I agree we should have some way of knowing when the upgrade is done.
> DescribeClusterResponse seems like the natural place to put information
> about each node's feature level. While we're at it, we should also add a
> boolean to indicate whether the given node is fenced. (This will always be
> false for ZK mode, of course...)
>
> >
> > 12. Since we are changing UpdateFeaturesRequest, do we need to change the
> > AdminClient api for updateFeatures too?
> >
> > 13. For the paragraph starting with "In the absence of an operator
> defined
> > value for metadata.version", in KIP-584, we described how to finalize
> > features with New cluster bootstrap. In that case, it's inconvenient for
> > the users to have to run an admin tool to finalize the version for each
> > feature. Instead, the system detects that the /features path is missing
> in
> > ZK and thus automatically finalizes every feature with the latest
> supported
> > version. Could we do something similar in the KRaft mode?
> >
>
> Yes, I think we need to have a section describing how this ties into
> creating new clusters. The simplest thing is probably to have the
> controller notice that there are no FeatureRecords at all, and then create
> a record for the latest metadata.version. This is analogous to how we
> assume the latest IBP if no IBP is configured.
>
> There is also the question of how to create a cluster that starts up with
> something other than the latest metadata.version. We could have a config
> for that, like initial.metadata.version, or pass a flag to the
> controllers... alternately, we could pass a flag to "kafka-storage.sh
> format".
>
> > 14. After the quorum leader generates a new snapshot, how do we force
> other
> > nodes to pick up the new snapshot?
> >
> > 15. I agree with Jose that it will be useful to describe when generating
> a
> > new snapshot is needed. To me, it seems the new snapshot is only needed
> > when incompatible changes are made.
> >
>
> I think it would be good to always generate a snapshot right before the
> upgrade. Then, if the upgrade goes wrong, we have a metadata state we could
> revert back to, albeit with some effort and potential data loss. But, I
> agree that the rationale for this should be spelled out in the KIP.
>
> I also think that the brokers should generate their own snapshots rather
> than fetching from the controller, both in the upgrade and downgrade case.
> Jose mentioned this earlier and I agree.
>
> best,
> Colin
>
> > 7. Jose, what control records were you referring?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Oct 5, 2021 at 8:53 AM David Arthur <davidart...@apache.org>
> wrote:
> >
> >> Jose, thanks for the thorough review and comments!
> >>
> >> I am out of the office until next week, so I probably won't be able to
> >> update the KIP until then. Here are some replies to your questions:
> >>
> >> 1. Generate snapshot on upgrade
> >> > > Metadata snapshot is generated and sent to the other nodes
> >> > Why does the Active Controller need to generate a new snapshot and
> >> > force a snapshot fetch from the replicas (inactive controller and
> >> > brokers) on an upgrade? Isn't writing the FeatureLevelRecord good
> >> > enough to communicate the upgrade to the replicas?
> >>
> >>
> >> You're right, we don't necessarily need to _transmit_ a snapshot, since
> >> each node can generate its own equivalent snapshot
> >>
> >> 2. Generate snapshot on downgrade
> >> > > Metadata snapshot is generated and sent to the other inactive
> >> > controllers and to brokers (this snapshot may be lossy!)
> >> > Why do we need to send this downgraded snapshot to the brokers? The
> >> > replicas have seen the FeatureLevelRecord and noticed the downgrade.
> >> > Can we have the replicas each independently generate a downgraded
> >> > snapshot at the offset for the downgraded FeatureLevelRecord? I assume
> >> > that the active controller will guarantee that all records after the
> >> > FatureLevelRecord use the downgraded version. If so, it would be good
> >> > to mention that explicitly.
> >>
> >>
> >> Similar to above, yes a broker that detects a downgrade via
> >> FeatureLevelRecord could generate its own downgrade snapshot and reload
> its
> >> state from that. This does get a little fuzzy when we consider cases
> where
> >> brokers are on different software versions and could be generating a
> >> downgrade snapshot for version X, but using different versions of the
> code.
> >> It might be safer to let the controller generate the snapshot so each
> >> broker (regardless of software version) gets the same records. However,
> for
> >> upgrades (or downgrades) we expect the whole cluster to be running the
> same
> >> software version before triggering the metadata.version change, so
> perhaps
> >> this isn't a likely scenario. Thoughts?
> >>
> >>
> >> 3. Max metadata version
> >> > >For the first release that supports metadata.version, we can simply
> >> > initialize metadata.version with the current (and only) version. For
> >> future
> >> > releases, we will need a mechanism to bootstrap a particular version.
> >> This
> >> > could be done using the meta.properties file or some similar
> mechanism.
> >> The
> >> > reason we need the allow for a specific initial version is to support
> the
> >> > use case of starting a Kafka cluster at version X with an older
> >> > metadata.version.
> >>
> >>
> >> I assume that the Active Controller will learn the metadata version of
> >> > the broker through the BrokerRegistrationRequest. How will the Active
> >> > Controller learn about the max metadata version of the inactive
> >> > controller nodes? We currently don't send a registration request from
> >> > the inactive controller to the active controller.
> >>
> >>
> >> This came up during the design, but I neglected to add it to the KIP. We
> >> will need a mechanism for determining the supported features of each
> >> controller similar to how brokers use BrokerRegistrationRequest. Perhaps
> >> controllers could write a FeatureLevelRecord (or similar) to the
> metadata
> >> log indicating their supported version. WDYT?
> >>
> >> Why do you need to bootstrap a particular version? Isn't the intent
> >> > that the broker will learn the active metadata version by reading the
> >> > metadata before unfencing?
> >>
> >>
> >> This bootstrapping is needed for when a KRaft cluster is first started.
> If
> >> we don't have this mechanism, the cluster can't really do anything until
> >> the operator finalizes the metadata.version with the tool. The
> >> bootstrapping will be done by the controller and the brokers will see
> this
> >> version as a record (like you say). I'll add some text to clarify this.
> >>
> >>
> >> 4. Reject Registration - This is related to the bullet point above.
> >> > What will be the behavior of the active controller if the broker sends
> >> > a metadata version that is not compatible with the cluster wide
> >> > metadata version?
> >>
> >>
> >> If a broker starts up with a lower supported version range than the
> current
> >> cluster metadata.version, it should log an error and shutdown. This is
> in
> >> line with KIP-584.
> >>
> >> 5. Discover upgrade
> >> > > This snapshot will be a convenient way to let broker and controller
> >> > components rebuild their entire in-memory state following an upgrade.
> >> > Can we rely on the presence of the FeatureLevelRecord for the metadata
> >> > version for this functionality? If so, it avoids having to reload the
> >> > snapshot.
> >>
> >>
> >> For upgrades, yes probably since we won't need to "rewrite" any records
> in
> >> this case. For downgrades, we will need to generate the snapshot and
> reload
> >> everything.
> >>
> >> 6. Metadata version specification
> >> > >  V4(version=4, isBackwardsCompatible=false, description="New
> metadata
> >> > record type Bar"),
> >>
> >>
> >> Very cool. Do you have plans to generate Apache Kafka HTML
> >> > documentation for this information? Would be helpful to display this
> >> > information to the user using the kafka-features.sh and feature RPC?
> >>
> >>
> >> Hm good idea :) I'll add a brief section on documentation. This would
> >> certainly be very useful
> >>
> >> 7.Downgrade records
> >> > I think we should explicitly mention that the downgrade process will
> >> > downgrade both metadata records and controller records. In KIP-630 we
> >> > introduced two control records for snapshots.
> >>
> >>
> >> Yes, good call. Let me re-read that KIP and include some details.
> >>
> >> Thanks again for the comments!
> >> -David
> >>
> >>
> >> On Wed, Sep 29, 2021 at 5:09 PM José Armando García Sancio
> >> <jsan...@confluent.io.invalid> wrote:
> >>
> >> > One more comment.
> >> >
> >> > 7.Downgrade records
> >> > I think we should explicitly mention that the downgrade process will
> >> > downgrade both metadata records and controller records. In KIP-630 we
> >> > introduced two control records for snapshots.
> >> >
> >>
>


-- 
David Arthur

Reply via email to