Hi Colin,

Thanks for the feedback, and suggestions! It is a great idea to provide a
`--finalize-latest` flag. I agree it's a burden to ask the user to manually
upgrade each feature to the latest version, after a release.

I have now updated the KIP adding this idea.

> What about a simple solution to problem this where we add a flag to the
command-line tool like --enable-latest?  The command-line tool could query
what the highest possible versions for
> each feature were (using the API) and then make another RPC to enable the
latest features.

(Kowshik): I've updated the KIP with the above idea, please look at this
section (point #3 and the tooling example later):
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport


> I think this is actually much easier than the version number solution.
The version string solution requires us to maintain a complicated mapping
table between version strings and features.  > In practice, we also have
"internal versions" in ApiVersion.scala like 2.4IV0, 2.4IV1, and so on.
This isn't simple for users to understand or use.

> It's also hard to know what the difference is between different version
strings.  For example, there's actually no difference between 2.5IV0 and
2.4IV1, but you wouldn't know that unless you > read the comments in
ApiVersion.scala.  A system administrator who didn't know this might end up
doing a cluster roll to upgrade the IBP that turned out to be unnecessary.

(Kowshik): Yes, I can see the disadvantages!


Cheers,
Kowshik



On Mon, Apr 6, 2020 at 3:46 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Jun,
>
> I agree that asking the user to manually upgrade all features to the
> latest version is a burden.  Then the user has to know what the latest
> version of every feature is when upgrading.
>
> What about a simple solution to problem this where we add a flag to the
> command-line tool like --enable-latest?  The command-line tool could query
> what the highest possible versions for each feature were (using the API)
> and then make another RPC to enable the latest features.
>
> I think this is actually much easier than the version number solution.
> The version string solution requires us to maintain a complicated mapping
> table between version strings and features.  In practice, we also have
> "internal versions" in ApiVersion.scala like 2.4IV0, 2.4IV1, and so on.
> This isn't simple for users to understand or use.
>
> It's also hard to know what the difference is between different version
> strings.  For example, there's actually no difference between 2.5IV0 and
> 2.4IV1, but you wouldn't know that unless you read the comments in
> ApiVersion.scala.  A system administrator who didn't know this might end up
> doing a cluster roll to upgrade the IBP that turned out to be unnecessary.
>
> best,
> Colin
>
>
> On Mon, Apr 6, 2020, at 12:06, Jun Rao wrote:
> > Hi, Kowshik,
> >
> > Thanks for the reply. A few more replies below.
> >
> > 100.6 You can look for the sentence "This operation requires ALTER on
> > CLUSTER." in KIP-455. Also, you can check its usage in
> > KafkaApis.authorize().
> >
> > 110. From the external client/tooling perspective, it's more natural to
> use
> > the release version for features. If we can use the same release version
> > for internal representation, it seems simpler (easier to understand, no
> > mapping overhead, etc). Is there a benefit with separate external and
> > internal versioning schemes?
> >
> > 111. To put this in context, when we had IBP, the default value is the
> > current released version. So, if you are a brand new user, you don't need
> > to configure IBP and all new features will be immediately available in
> the
> > new cluster. If you are upgrading from an old version, you do need to
> > understand and configure IBP. I see a similar pattern here for
> > features. From the ease of use perspective, ideally, we shouldn't
> require a
> > new user to have an extra step such as running a bootstrap script unless
> > it's truly necessary. If someone has a special need (all the cases you
> > mentioned seem special cases?), they can configure a mode such that
> > features are enabled/disabled manually.
> >
> > Jun
> >
> > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <kpraka...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the feedback and suggestions. Please find my response below.
> > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed to
> > > > issue that request if security is enabled. So, we need to assign the
> new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as an example.
> > >
> > > (Kowshik): I don't see any reference to the words ResourceType or
> > > AclOperations
> > > in the KIP. Please let me know how I can use the KIP that you linked to
> > > know how to
> > > setup the appropriate ResourceType and/or ClusterOperation?
> > >
> > > > 105. If we change delete to disable, it's better to do this
> consistently
> > > in
> > > > request protocol and admin api as well.
> > >
> > > (Kowshik): The API shouldn't be called 'disable' when it is deleting a
> > > feature.
> > > I've just changed the KIP to use 'delete'. I don't have a strong
> > > preference.
> > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > > for new features to be included in minor releases too. Should we
> make the
> > > > feature versioning match the release versioning?
> > >
> > > (Kowshik): The release version can be mapped to a set of feature
> versions,
> > > and this can be done, for example in the tool (or even external to the
> > > tool).
> > > Can you please clarify what I'm missing?
> > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > >
> > > (Kowshik): I agree that there is a trade-off here, but it will help
> > > to decide whether the automation can be thought through in the future
> > > in a follow up KIP, or right now in this KIP. We may invest
> > > in automation, but we have to decide whether we should do it
> > > now or later.
> > >
> > > For the inconvenience that you mentioned, do you think the problem
> that you
> > > mentioned can be  overcome by asking for the cluster operator to run a
> > > bootstrap script  when he/she knows that a specific AK release has been
> > > almost completely deployed in a cluster for the first time? Idea is
> that
> > > the
> > > bootstrap script will know how to map a specific AK release to
> finalized
> > > feature versions, and run the `kafka-features.sh` tool appropriately
> > > against
> > > the cluster.
> > >
> > > Now, coming back to your automation proposal/question.
> > > I do see the value of automated feature version finalization, but I
> also
> > > see
> > > that this will open up several questions and some risks, as explained
> > > below.
> > > The answers to these depend on the definition of the automation we
> choose
> > > to build, and how well does it fit into a kafka deployment.
> > > Basically, it can be unsafe for the controller to finalize feature
> version
> > > upgrades automatically, without learning about the intent of the
> cluster
> > > operator.
> > > 1. We would sometimes want to lock feature versions only when we have
> > > externally verified
> > > the stability of the broker binary.
> > > 2. Sometimes only the cluster operator knows that a cluster upgrade is
> > > complete,
> > > and new brokers are highly unlikely to join the cluster.
> > > 3. Only the cluster operator knows that the intent is to deploy the
> same
> > > version
> > > of the new broker release across the entire cluster (i.e. the latest
> > > downloaded version).
> > > 4. For downgrades, it appears the controller still needs some external
> > > input
> > > (such as the proposed tool) to finalize a feature version downgrade.
> > >
> > > If we have automation, that automation can end up failing in some of
> the
> > > cases
> > > above. Then, we need a way to declare that the cluster is "not ready"
> if
> > > the
> > > controller cannot automatically finalize some basic required feature
> > > version
> > > upgrades across the cluster. We need to make the cluster operator
> aware in
> > > such a scenario (raise an alert or alike).
> > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead of
> > > 48.
> > >
> > > (Kowshik): Done.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Kowshik,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > > > 100.6 For every new request, the admin needs to control who is
> allowed to
> > > > issue that request if security is enabled. So, we need to assign the
> new
> > > > request a ResourceType and possible AclOperations. See
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > > as
> > > > an example.
> > > >
> > > > 105. If we change delete to disable, it's better to do this
> consistently
> > > in
> > > > request protocol and admin api as well.
> > > >
> > > > 110. The minVersion/maxVersion for features use int64. Currently, our
> > > > release version schema is major.minor.bugfix (e.g. 2.5.0). It's
> possible
> > > > for new features to be included in minor releases too. Should we
> make the
> > > > feature versioning match the release versioning?
> > > >
> > > > 111. "During regular operations, the data in the ZK node can be
> mutated
> > > > only via a specific admin API served only by the controller." I am
> > > > wondering why can't the controller auto finalize a feature version
> after
> > > > all brokers are upgraded? For new users who download the latest
> version
> > > to
> > > > build a new cluster, it's inconvenient for them to have to manually
> > > enable
> > > > each feature.
> > > >
> > > > 112. DeleteFeaturesResponse: It seems the apiKey should be 49
> instead of
> > > > 48.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam <
> kpraka...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Thanks a lot for the great feedback! Please note that the design
> > > > > has changed a little bit on the KIP, and we now propagate the
> finalized
> > > > > features metadata only via ZK watches (instead of
> UpdateMetadataRequest
> > > > > from the controller).
> > > > >
> > > > > Please find below my response to your questions/feedback, with the
> > > prefix
> > > > > "(Kowshik):".
> > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers,
> should we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > >
> > > > > (Kowshik): Great point! Done. I have added a timeout field. Note:
> we no
> > > > > longer
> > > > > wait for responses from brokers, since the design has been changed
> so
> > > > that
> > > > > the
> > > > > features information is propagated via ZK. Nevertheless, it is
> right to
> > > > > have a timeout
> > > > > for the request.
> > > > >
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > >
> > > > > (Kowshik): Great point! Yeah, I have modified it to just return an
> > > error
> > > > > code and a message.
> > > > > Previously it was not echoing the "request", rather it was
> returning
> > > the
> > > > > latest set of
> > > > > cluster-wide finalized features (after applying the updates). But
> you
> > > are
> > > > > right,
> > > > > the additional info is not required, so I have removed it from the
> > > > response
> > > > > schema.
> > > > >
> > > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > > features?
> > > > >
> > > > > (Kowshik): This is already present in the KIP via the
> > > 'DescribeFeatures'
> > > > > Admin API,
> > > > > which, underneath covers uses the ApiVersionsRequest to
> list/describe
> > > the
> > > > > existing features. Please read the 'Tooling support' section.
> > > > >
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request. For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > DeleteFeaturesRequest
> > > > >
> > > > > (Kowshik): Great point! I have modified the KIP now to have 2
> separate
> > > > > controller APIs
> > > > > serving these different purposes:
> > > > > 1. updateFeatures
> > > > > 2. deleteFeatures
> > > > >
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > >
> > > > > (Kowshik): In the latest KIP write-up, it is called epoch (instead
> of
> > > > > version), and
> > > > > it is just the ZK node version. Basically, this is the epoch for
> the
> > > > > cluster-wide
> > > > > finalized feature version metadata. This metadata is served to
> clients
> > > > via
> > > > > the
> > > > > ApiVersionsResponse (for reads). We propagate updates from the
> > > > '/features'
> > > > > ZK node
> > > > > to all brokers, via ZK watches setup by each broker on the
> '/features'
> > > > > node.
> > > > >
> > > > > Now here is why the ordering is important:
> > > > > ZK watches don't propagate at the same time. As a result, the
> > > > > ApiVersionsResponse
> > > > > is eventually consistent across brokers. This can introduce cases
> > > > > where clients see an older lower epoch of the features metadata,
> after
> > > a
> > > > > more recent
> > > > > higher epoch was returned at a previous point in time. We expect
> > > clients
> > > > > to always employ the rule that the latest received higher epoch of
> > > > metadata
> > > > > always trumps an older smaller epoch. Those clients that are
> external
> > > to
> > > > > Kafka should strongly consider discovering the latest metadata once
> > > > during
> > > > > startup from the brokers, and if required refresh the metadata
> > > > periodically
> > > > > (to get the latest metadata).
> > > > >
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > >
> > > > > (Kowshik): What is ACL, and how could I find out which one to
> specify?
> > > > > Please could you provide me some pointers? I'll be glad to update
> the
> > > > > KIP once I know the next steps.
> > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > the json?
> > > > >
> > > > > (Kowshik): Great point! Done. I've increased the version in the
> broker
> > > > json
> > > > > by 1.
> > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > >
> > > > > (Kowshik): Great point! Done. I'm using the ZK node version now,
> > > instead
> > > > of
> > > > > explicitly
> > > > > incremented epoch.
> > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature
> (ex: can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > >
> > > > > (Kowshik): Not really. The text was just conveying that a broker
> could
> > > > > "know" of
> > > > > a new feature version, but it does not mean the broker should have
> also
> > > > > activated the effects of the feature version. Knowing vs activation
> > > are 2
> > > > > separate things,
> > > > > and the latter can be achieved by dynamic config. I have reworded
> the
> > > > text
> > > > > to
> > > > > make this clear to the reader.
> > > > >
> > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > >
> > > > > (Kowshik): With the new improved design, we have completely
> eliminated
> > > > the
> > > > > need to
> > > > > use UpdateMetadataRequest. This is because we now rely on ZK to
> deliver
> > > > the
> > > > > notifications for changes to the '/features' ZK node.
> > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > > better
> > > > > > to use enable/disable?
> > > > >
> > > > > (Kowshik): For delete, yes, I have changed it so that we instead
> call
> > > it
> > > > > 'disable'.
> > > > > However for 'update', it can now also refer to either an upgrade
> or a
> > > > > forced downgrade.
> > > > > Therefore, I have left it the way it is, just calling it as just
> > > > 'update'.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kowshik
> > > > >
> > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <j...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Kowshik,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > >
> > > > > > 100. UpdateFeaturesRequest/UpdateFeaturesResponse
> > > > > > 100.1 Since this request waits for responses from brokers,
> should we
> > > > add
> > > > > a
> > > > > > timeout in the request (like createTopicRequest)?
> > > > > > 100.2 The response schema is a bit weird. Typically, the response
> > > just
> > > > > > shows an error code and an error message, instead of echoing the
> > > > request.
> > > > > > 100.3 Should we add a separate request to list/describe the
> existing
> > > > > > features?
> > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE in a single
> request. For
> > > > > > DELETE, the version field doesn't make sense. So, I guess the
> broker
> > > > just
> > > > > > ignores this? An alternative way is to have a separate
> > > > > > DeleteFeaturesRequest
> > > > > > 100.5 In UpdateFeaturesResponse, we have "The monotonically
> > > increasing
> > > > > > version of the metadata for finalized features." I am wondering
> why
> > > the
> > > > > > ordering is important?
> > > > > > 100.6 Could you specify the required ACL for this new request?
> > > > > >
> > > > > > 101. For the broker registration ZK node, should we bump up the
> > > version
> > > > > in
> > > > > > the json?
> > > > > >
> > > > > > 102. For the /features ZK node, not sure if we need the epoch
> field.
> > > > Each
> > > > > > ZK node has an internal version field that is incremented on
> every
> > > > > update.
> > > > > >
> > > > > > 103. "Enabling the actual semantics of a feature version
> cluster-wide
> > > > is
> > > > > > left to the discretion of the logic implementing the feature
> (ex: can
> > > > be
> > > > > > done via dynamic broker config)." Does that mean the broker
> > > > registration
> > > > > ZK
> > > > > > node will be updated dynamically when this happens?
> > > > > >
> > > > > > 104. UpdateMetadataRequest
> > > > > > 104.1 It would be useful to describe when the feature metadata is
> > > > > included
> > > > > > in the request. My understanding is that it's only included if
> (1)
> > > > there
> > > > > is
> > > > > > a change to the finalized feature; (2) broker restart; (3)
> controller
> > > > > > failover.
> > > > > > 104.2 The new fields have the following versions. Why are the
> > > versions
> > > > 3+
> > > > > > when the top version is bumped to 6?
> > > > > >       "fields":  [
> > > > > >         {"name": "Name", "type":  "string", "versions":  "3+",
> > > > > >           "about": "The name of the feature."},
> > > > > >         {"name":  "Version", "type":  "int64", "versions":  "3+",
> > > > > >           "about": "The finalized version for the feature."}
> > > > > >       ]
> > > > > >
> > > > > > 105. kafka-features.sh: Instead of using update/delete, perhaps
> it's
> > > > > better
> > > > > > to use enable/disable?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik Prakasam <
> > > > kpraka...@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Boyang,
> > > > > > >
> > > > > > > Thanks for the great feedback! I have updated the KIP based on
> your
> > > > > > > feedback.
> > > > > > > Please find my response below for your comments, look for
> sentences
> > > > > > > starting
> > > > > > > with "(Kowshik)" below.
> > > > > > >
> > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > >
> > > > > > > (Kowshik): Great point! Done. I've added a reference in the
> KIP.
> > > > > > >
> > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm
> assuming as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > >
> > > > > > > (Kowshik): Great point! I'd agree and disagree here. While I
> agree
> > > > that
> > > > > > > accidental
> > > > > > > downgrades can cause problems, I also think sometimes
> downgrades
> > > > should
> > > > > > > be allowed for emergency reasons (not all downgrades cause
> issues).
> > > > > > > It is just subjective to the feature being downgraded.
> > > > > > >
> > > > > > > To be more strict about feature version downgrades, I have
> modified
> > > > the
> > > > > > KIP
> > > > > > > proposing that we mandate a `--force-downgrade` flag be used
> in the
> > > > > > > UPDATE_FEATURES api
> > > > > > > and the tooling, whenever the human is downgrading a finalized
> > > > feature
> > > > > > > version.
> > > > > > > Hopefully this should cover the requirement, until we find the
> need
> > > > for
> > > > > > > advanced downgrade support.
> > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only
> after the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > >
> > > > > > > (Kowshik): This is a great point/question. One of the
> expectations
> > > > out
> > > > > of
> > > > > > > this KIP, which is
> > > > > > > already followed in the broker, is the following.
> > > > > > >  - Imagine at time T1 the broker starts up and registers it’s
> > > > presence
> > > > > in
> > > > > > > ZK,
> > > > > > >    along with advertising it’s supported features.
> > > > > > >  - Imagine at a future time T2 the broker receives the
> > > > > > > UpdateMetadataRequest
> > > > > > >    from the controller, which contains the latest finalized
> > > features
> > > > as
> > > > > > > seen by
> > > > > > >    the controller. The broker validates this data against it’s
> > > > > supported
> > > > > > > features to
> > > > > > >    make sure there is no mismatch (it will shutdown if there
> is an
> > > > > > > incompatibility).
> > > > > > >
> > > > > > > It is expected that during the time between the 2 events T1
> and T2,
> > > > the
> > > > > > > broker is
> > > > > > > almost a silent entity in the cluster. It does not add any
> value to
> > > > the
> > > > > > > cluster, or carry
> > > > > > > out any important broker activities. By “important”, I mean it
> is
> > > not
> > > > > > doing
> > > > > > > mutations
> > > > > > > on it’s persistence, not mutating critical in-memory state,
> won’t
> > > be
> > > > > > > serving
> > > > > > > produce/fetch requests. Note it doesn’t even know it’s assigned
> > > > > > partitions
> > > > > > > until
> > > > > > > it receives UpdateMetadataRequest from controller. Anything the
> > > > broker
> > > > > is
> > > > > > > doing up
> > > > > > > until this point is not damaging/useful.
> > > > > > >
> > > > > > > I’ve clarified the above in the KIP, see this new section:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> > > > > > > .
> > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are
> defined in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > >
> > > > > > > (Kowshik): Great point! You understood this right. Here adding
> a
> > > > > feature
> > > > > > > means we are
> > > > > > > adding a cluster-wide finalized *max* version for a feature
> that
> > > was
> > > > > > > previously never finalized.
> > > > > > > I have clarified this in the KIP now.
> > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > >
> > > > > > > (Kowshik): Great point! I have modified the KIP adding the
> above
> > > (see
> > > > > > > 'Tooling support -> Admin API changes').
> > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > >
> > > > > > > (Kowshik): Nice question! The broker reads finalized feature
> info
> > > > > stored
> > > > > > in
> > > > > > > ZK,
> > > > > > > only during startup when it does a validation. When serving
> > > > > > > `ApiVersionsRequest`, the
> > > > > > > broker does not read this info from ZK directly. I'd imagine
> the
> > > risk
> > > > > is
> > > > > > > that it can increase
> > > > > > > the ZK read QPS which can be a bottleneck for the system.
> Today, in
> > > > > Kafka
> > > > > > > we use the
> > > > > > > controller to fan out ZK updates to brokers and we want to
> stick to
> > > > > that
> > > > > > > pattern to avoid
> > > > > > > the ZK read bottleneck when serving `ApiVersionsRequest`.
> > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > >
> > > > > > > (Kowshik): Great question! The finalized version of a feature
> > > > basically
> > > > > > > refers to
> > > > > > > the cluster-wide finalized feature "maximum" version. For
> example,
> > > if
> > > > > the
> > > > > > > 'group_coordinator' feature
> > > > > > > has the finalized version set to 10, then, it means that
> > > cluster-wide
> > > > > all
> > > > > > > versions upto v10 are
> > > > > > > supported for this feature. However, note that if some version
> (ex:
> > > > v0)
> > > > > > > gets deprecated
> > > > > > > for this feature, then we don’t convey that using this scheme
> (also
> > > > > > > supporting deprecation is a non-goal).
> > > > > > >
> > > > > > > (Kowshik): I’ve now modified the KIP at all points, refering to
> > > > > finalized
> > > > > > > feature "maximum" versions.
> > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may be
> > > a
> > > > > > > producer
> > > > > > >
> > > > > > > (Kowshik): Great point! Done.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kowshik
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang Chen <
> > > > > reluctanthero...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Kowshik,
> > > > > > > >
> > > > > > > > thanks for the revised KIP. Got a couple of questions:
> > > > > > > >
> > > > > > > > 1. "When is it safe for the brokers to begin handling EOS
> > > traffic"
> > > > > > could
> > > > > > > be
> > > > > > > > converted as "When is it safe for the brokers to start
> serving
> > > new
> > > > > > > > Exactly-Once(EOS) semantics" since EOS is not explained
> earlier
> > > in
> > > > > the
> > > > > > > > context.
> > > > > > > >
> > > > > > > > 2. In the *Explanation *section, the metadata version number
> part
> > > > > > seems a
> > > > > > > > bit blurred. Could you point a reference to later section
> that we
> > > > > going
> > > > > > > to
> > > > > > > > store it in Zookeeper and update it every time when there is
> a
> > > > > feature
> > > > > > > > change?
> > > > > > > >
> > > > > > > > 3. For the feature downgrade, although it's a Non-goal of the
> > > KIP,
> > > > > for
> > > > > > > > features such as group coordinator semantics, there is no
> legal
> > > > > > scenario
> > > > > > > to
> > > > > > > > perform a downgrade at all. So having downgrade door open is
> > > pretty
> > > > > > > > error-prone as human faults happen all the time. I'm
> assuming as
> > > > new
> > > > > > > > features are implemented, it's not very hard to add a flag
> during
> > > > > > feature
> > > > > > > > creation to indicate whether this feature is "downgradable".
> > > Could
> > > > > you
> > > > > > > > explain a bit more on the extra engineering effort for
> shipping
> > > > this
> > > > > > KIP
> > > > > > > > with downgrade protection in place?
> > > > > > > >
> > > > > > > > 4. "Each broker’s supported dictionary of feature versions
> will
> > > be
> > > > > > > defined
> > > > > > > > in the broker code." So this means in order to restrict a
> certain
> > > > > > > feature,
> > > > > > > > we need to start the broker first and then send a feature
> gating
> > > > > > request
> > > > > > > > immediately, which introduces a time gap and the
> > > intended-to-close
> > > > > > > feature
> > > > > > > > could actually serve request during this phase. Do you think
> we
> > > > > should
> > > > > > > also
> > > > > > > > support configurations as well so that admin user could
> freely
> > > roll
> > > > > up
> > > > > > a
> > > > > > > > cluster with all nodes complying the same feature gating,
> without
> > > > > > > worrying
> > > > > > > > about the turnaround time to propagate the message only
> after the
> > > > > > cluster
> > > > > > > > starts up?
> > > > > > > >
> > > > > > > > 5. "adding a new Feature, updating or deleting an existing
> > > > Feature",
> > > > > > may
> > > > > > > be
> > > > > > > > I misunderstood something, I thought the features are
> defined in
> > > > > broker
> > > > > > > > code, so admin could not really create a new feature?
> > > > > > > >
> > > > > > > > 6. I think we need a separate error code like
> > > > > > FEATURE_UPDATE_IN_PROGRESS
> > > > > > > to
> > > > > > > > reject a concurrent feature update request.
> > > > > > > >
> > > > > > > > 7. I think we haven't discussed the alternative solution to
> pass
> > > > the
> > > > > > > > feature information through Zookeeper. Is that mentioned in
> the
> > > KIP
> > > > > to
> > > > > > > > justify why using UpdateMetadata is more favorable?
> > > > > > > >
> > > > > > > > 8. I was under the impression that user could configure a
> range
> > > of
> > > > > > > > supported versions, what's the trade-off for allowing single
> > > > > finalized
> > > > > > > > version only?
> > > > > > > >
> > > > > > > > 9. One minor syntax fix: Note that here the "client" here
> may be
> > > a
> > > > > > > producer
> > > > > > > >
> > > > > > > > Boyang
> > > > > > > >
> > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin McCabe <
> cmcc...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020, at 19:24, Kowshik Prakasam wrote:
> > > > > > > > > > Hi Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback! I've changed the KIP to address
> your
> > > > > > > > > > suggestions.
> > > > > > > > > > Please find below my explanation. Here is a link to KIP
> 584:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > 1. '__data_version__' is the version of the finalized
> feature
> > > > > > > metadata
> > > > > > > > > > (i.e. actual ZK node contents), while the
> > > '__schema_version__'
> > > > is
> > > > > > the
> > > > > > > > > > version of the schema of the data persisted in ZK. These
> > > serve
> > > > > > > > different
> > > > > > > > > > purposes. '__data_version__' is is useful mainly to
> clients
> > > > > during
> > > > > > > > reads,
> > > > > > > > > > to differentiate between the 2 versions of eventually
> > > > consistent
> > > > > > > > > 'finalized
> > > > > > > > > > features' metadata (i.e. larger metadata version is more
> > > > recent).
> > > > > > > > > > '__schema_version__' provides an additional degree of
> > > > > flexibility,
> > > > > > > > where
> > > > > > > > > if
> > > > > > > > > > we decide to change the schema for '/features' node in
> ZK (in
> > > > the
> > > > > > > > > future),
> > > > > > > > > > then we can manage broker roll outs suitably (i.e.
> > > > > > > > > > serialization/deserialization of the ZK data can be
> handled
> > > > > > safely).
> > > > > > > > >
> > > > > > > > > Hi Kowshik,
> > > > > > > > >
> > > > > > > > > If you're talking about a number that lets you know if
> data is
> > > > more
> > > > > > or
> > > > > > > > > less recent, we would typically call that an epoch, and
> not a
> > > > > > version.
> > > > > > > > For
> > > > > > > > > the ZK data structures, the word "version" is typically
> > > reserved
> > > > > for
> > > > > > > > > describing changes to the overall schema of the data that
> is
> > > > > written
> > > > > > to
> > > > > > > > > ZooKeeper.  We don't even really change the "version" of
> those
> > > > > > schemas
> > > > > > > > that
> > > > > > > > > much, since most changes are backwards-compatible.  But we
> do
> > > > > include
> > > > > > > > that
> > > > > > > > > version field just in case.
> > > > > > > > >
> > > > > > > > > I don't think we really need an epoch here, though, since
> we
> > > can
> > > > > just
> > > > > > > > look
> > > > > > > > > at the broker epoch.  Whenever the broker registers, its
> epoch
> > > > will
> > > > > > be
> > > > > > > > > greater than the previous broker epoch.  And the newly
> > > registered
> > > > > > data
> > > > > > > > will
> > > > > > > > > take priority.  This will be a lot simpler than adding a
> > > separate
> > > > > > epoch
> > > > > > > > > system, I think.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. Regarding admin client needing min and max
> information -
> > > you
> > > > > are
> > > > > > > > > right!
> > > > > > > > > > I've changed the KIP such that the Admin API also allows
> the
> > > > user
> > > > > > to
> > > > > > > > read
> > > > > > > > > > 'supported features' from a specific broker. Please look
> at
> > > the
> > > > > > > section
> > > > > > > > > > "Admin API changes".
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. Regarding the use of `long` vs `Long` - it was not
> > > > deliberate.
> > > > > > > I've
> > > > > > > > > > improved the KIP to just use `long` at all places.
> > > > > > > > >
> > > > > > > > > Sounds good.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. Regarding kafka.admin.FeatureCommand tool - you are
> right!
> > > > > I've
> > > > > > > > > updated
> > > > > > > > > > the KIP sketching the functionality provided by this
> tool,
> > > with
> > > > > > some
> > > > > > > > > > examples. Please look at the section "Tooling support
> > > > examples".
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks, Kowshik.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Kowshik
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM Colin McCabe <
> > > > > cmcc...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks, Kowshik, this looks good.
> > > > > > > > > > >
> > > > > > > > > > > In the "Schema" section, do we really need both
> > > > > > __schema_version__
> > > > > > > > and
> > > > > > > > > > > __data_version__?  Can we just have a single version
> field
> > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > Shouldn't the Admin(Client) function have some way to
> get
> > > the
> > > > > min
> > > > > > > and
> > > > > > > > > max
> > > > > > > > > > > information that we're exposing as well?  I guess we
> could
> > > > have
> > > > > > > min,
> > > > > > > > > max,
> > > > > > > > > > > and current.  Unrelated: is the use of Long rather than
> > > long
> > > > > > > > deliberate
> > > > > > > > > > > here?
> > > > > > > > > > >
> > > > > > > > > > > It would be good to describe how the command line tool
> > > > > > > > > > > kafka.admin.FeatureCommand will work.  For example the
> > > flags
> > > > > that
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > take and the output that it will generate to STDOUT.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08, Kowshik Prakasam wrote:
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've opened KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584> <
> > > > > > > https://issues.apache.org/jira/browse/KIP-584
> > > > > > > > >
> > > > > > > > > > > > which
> > > > > > > > > > > > is intended to provide a versioning scheme for
> features.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > use
> > > > > > > > > > > > this thread to discuss the same. I'd appreciate any
> > > > feedback
> > > > > on
> > > > > > > > this.
> > > > > > > > > > > > Here
> > > > > > > > > > > > is a link to KIP-584
> > > > > > <https://issues.apache.org/jira/browse/KIP-584>:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > > > > > > > > > > >  .
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Kowshik
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to