Hi all,
Jun brought up a good point in his last email about tagged fields, and I've
updated the KIP to reflect that the changes to requests and responses will
be in the form of tagged fields to avoid changing IBP.

Jun: I plan on sending a followup email to address some of the other points.

Thanks,
Justine

On Mon, Sep 14, 2020 at 4:25 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Justine,
>
> Thanks for the updated KIP. A few comments below.
>
> 10. LeaderAndIsr Response: Do we need the topic name?
>
> 11. For the changed request/response, other than LeaderAndIsr,
> UpdateMetadata, Metadata, do we need to include the topic name?
>
> 12. It seems that upgrades don't require IBP. Does that mean the new fields
> in all the request/response are added as tagged fields without bumping up
> the request version? It would be useful to make that clear.
>
> 13. Partition Metadata file: Do we need to include the topic name and the
> partition id since they are implied in the directory name?
>
> 14. In the JBOD mode, we support moving a partition's data from one disk to
> another. Will the new partition metadata file be copied during that
> process?
>
> 15. The KIP says "Remove deleted topics from replicas by sending
> StopReplicaRequest V2 for any topics which do not contain a topic ID, and
> V3 for any topics which do contain a topic ID.". However, it seems the
> updated controller will create all missing topic IDs first before doing
> other actions. So, is StopReplicaRequest V2 needed?
>
> Jun
>
> On Fri, Sep 11, 2020 at 10:31 AM John Roesler <vvcep...@apache.org> wrote:
>
> > Thanks, Justine!
> >
> > Your response seems compelling to me.
> >
> > -John
> >
> > On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan wrote:
> > > Hello all,
> > > Thanks for continuing the discussion! I have a few responses to your
> > points.
> > >
> > > Tom: You are correct in that this KIP has not mentioned the
> > > DeleteTopicsRequest. I think that this would be out of scope for now,
> but
> > > may be something worth adding in the future.
> > >
> > > John: We did consider sequence ids, but there are a few reasons to
> favor
> > > UUIDs. There are several cases where topics from different clusters may
> > > interact now and in the future. For example, Mirror Maker 2 may benefit
> > > from being able to detect when a cluster being mirrored is deleted and
> > > recreated and globally unique identifiers would make resolving issues
> > > easier than sequence IDs which may collide between clusters. KIP-405
> > > (tiered storage) will also benefit from globally unique IDs as shared
> > > buckets may be used between clusters.
> > >
> > > Globally unique IDs would also make functionality like moving topics
> > > between disparate clusters easier in the future, simplify any future
> > > implementations of backups and restores, and more. In general, unique
> IDs
> > > would ensure that the source cluster topics do not conflict with the
> > > destination cluster topics.
> > >
> > > If we were to use sequence ids, we would need sufficiently large
> cluster
> > > ids to be stored with the topic identifiers or we run the risk of
> > > collisions. This will give up any advantage in compactness that
> sequence
> > > numbers may bring. Given these advantages I think it makes sense to use
> > > UUIDs.
> > >
> > > Gokul: This is an interesting idea, but this is a breaking change. Out
> of
> > > scope for now, but maybe worth discussing in the future.
> > >
> > > Hope this explains some of the decisions,
> > >
> > > Justine
> > >
> > >
> > >
> > > On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan Subramanian <
> > > gokul24...@gmail.com> wrote:
> > >
> > > > Hi.
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > Have you thought about whether it makes sense to support authorizing
> a
> > > > principal for a topic ID rather than a topic name to achieve tighter
> > > > security?
> > > >
> > > > Or is the topic ID fundamentally an internal detail similar to epochs
> > used
> > > > in a bunch of other places in Kafka?
> > > >
> > > > Thanks.
> > > >
> > > > On Fri, Sep 11, 2020 at 4:06 PM John Roesler <vvcep...@apache.org>
> > wrote:
> > > >
> > > > > Hello Justine,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > I happen to have been confronted recently with the need to keep
> > track of
> > > > a
> > > > > large number of topics as compactly as possible. I was going to
> come
> > up
> > > > > with some way to dictionary encode the topic names as integers, but
> > this
> > > > > seems much better!
> > > > >
> > > > > Apologies if this has been raised before, but I’m wondering about
> the
> > > > > choice of UUID vs sequence number for the ids. Typically, I’ve seen
> > UUIDs
> > > > > in two situations:
> > > > > 1. When processes need to generate non-colliding identifiers
> without
> > > > > coordination.
> > > > > 2. When the identifier needs to be “universally unique”; I.e., the
> > > > > identifier needs to distinguish the entity from all other entities
> > that
> > > > > could ever exist. This is useful in cases where entities from all
> > kinds
> > > > of
> > > > > systems get mixed together, such as when dumping logs from all
> > processes
> > > > in
> > > > > a company into a common system.
> > > > >
> > > > > Maybe I’m being short-sighted, but it doesn’t seem like either
> really
> > > > > applies here. It seems like the brokers could and would achieve
> > consensus
> > > > > when creating a topic anyway, which is all that’s required to
> > generate
> > > > > non-colliding sequence ids. For the second, as you mention, we
> could
> > > > assign
> > > > > a UUID to the cluster as a whole, which would render any resource
> > scoped
> > > > to
> > > > > the broker universally unique as well.
> > > > >
> > > > > The reason I mention this is that, although a UUID is way more
> > compact
> > > > > than topic names, it’s still 16 bytes. In contrast, a 4-byte
> integer
> > > > > sequence id would give us 4 billion unique topics per cluster,
> which
> > > > seems
> > > > > like enough ;)
> > > > >
> > > > > Considering the number of different times these topic identifiers
> are
> > > > sent
> > > > > over the wire or stored in memory, it seems like it might be worth
> > the
> > > > > additional 4x space savings.
> > > > >
> > > > > What do you think about this?
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote:
> > > > > > Hi Justine,
> > > > > >
> > > > > > This looks like a very welcome improvement. Thanks!
> > > > > >
> > > > > > Maybe I missed it, but the KIP doesn't seem to mention changing
> > > > > > DeleteTopicsRequest to identify the topic using an id. Maybe
> > that's out
> > > > > of
> > > > > > scope, but DeleteTopicsRequest is not listed among the Future
> Work
> > APIs
> > > > > > either.
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > > On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana <
> > > > satish.dugg...@gmail.com
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Lucas/Justine for the nice KIP.
> > > > > > >
> > > > > > > It has several benefits which also include simplifying the
> topic
> > > > > > > deletion process by controller and logs cleanup by brokers in
> > corner
> > > > > > > cases.
> > > > > > >
> > > > > > > Best,
> > > > > > > Satish.
> > > > > > >
> > > > > > > On Wed, Sep 9, 2020 at 10:07 PM Justine Olshan <
> > jols...@confluent.io
> > > > > > > wrote:
> > > > > > > > Hello all, it's been almost a year! I've made some changes to
> > this
> > > > > KIP
> > > > > > > and hope to continue the discussion.
> > > > > > > > One of the main changes I've added is now the metadata
> response
> > > > will
> > > > > > > include the topic ID (as Colin suggested). Clients can obtain
> the
> > > > > topicID
> > > > > > > of a given topic through a TopicDescription. The topicId will
> > also be
> > > > > > > included with the UpdateMetadata request.
> > > > > > > > Let me know what you all think.
> > > > > > > > Thank you,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On 2019/09/13 16:38:26, "Colin McCabe" <cmcc...@apache.org>
> > wrote:
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > Thanks for tackling this.  Topic IDs are a great idea, and
> > this
> > > > is
> > > > > a
> > > > > > > really good writeup.
> > > > > > > > > For /brokers/topics/[topic], the schema version should be
> > bumped
> > > > to
> > > > > > > version 3, rather than 2.  KIP-455 bumped the version of this
> > znode
> > > > to
> > > > > 2
> > > > > > > already :)
> > > > > > > > > Given that we're going to be seeing these things as strings
> > as
> > > > lot
> > > > > (in
> > > > > > > logs, in ZooKeeper, on the command-line, etc.), does it make
> > sense to
> > > > > use
> > > > > > > base64 when converting them to strings?
> > > > > > > > > Here is an example of the hex representation:
> > > > > > > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
> > > > > > > > >
> > > > > > > > > And here is an example in base64.
> > > > > > > > > b8tRS7h4TJ2Vt43Dp85v2A
> > > > > > > > >
> > > > > > > > > The base64 version saves 15 letters (to be fair, 4 of those
> > were
> > > > > > > dashes that we could have elided in the hex representation.)
> > > > > > > > > Another thing to consider is that we should specify that
> the
> > > > > > > all-zeroes UUID is not a valid topic UUID.   We can't use null
> > for
> > > > this
> > > > > > > because we can't pass a null UUID over the RPC protocol (there
> > is no
> > > > > > > special pattern for null, nor do we want to waste space
> reserving
> > > > such
> > > > > a
> > > > > > > pattern.)
> > > > > > > > > Maybe I missed it, but did you describe "migration of...
> > existing
> > > > > > > topic[s] without topic IDs" in detail in any section?  It seems
> > like
> > > > > when
> > > > > > > the new controller becomes active, it should just generate
> random
> > > > > UUIDs for
> > > > > > > these, and write the random UUIDs back to ZooKeeper.  It would
> be
> > > > good
> > > > > to
> > > > > > > spell that out.  We should make it clear that this happens
> > regardless
> > > > > of
> > > > > > > the inter-broker protocol version (it's a compatible change).
> > > > > > > > > "LeaderAndIsrRequests including an is_every_partition flag"
> > > > seems a
> > > > > > > bit wordy.  Can we just call these "full LeaderAndIsrRequests"?
> > Then
> > > > > the
> > > > > > > RPC field could be named "full".  Also, it would probably be
> > better
> > > > > for the
> > > > > > > RPC field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL },
> so
> > > > that
> > > > > we
> > > > > > > can cleanly handle old versions (by treating them as
> UNSPECIFIED)
> > > > > > > > > In the LeaderAndIsrRequest section, you write "A final
> > deletion
> > > > > event
> > > > > > > will be secheduled for X ms after the LeaderAndIsrRequest was
> > first
> > > > > > > received..."  I guess the X was a placeholder that you intended
> > to
> > > > > replace
> > > > > > > before posting? :)  In any case, this seems like the kind of
> > thing
> > > > we'd
> > > > > > > want a configuration for.  Let's describe that configuration
> key
> > > > > somewhere
> > > > > > > in this KIP, including what its default value is.
> > > > > > > > > We should probably also log a bunch of messages at WARN
> level
> > > > when
> > > > > > > something is scheduled for deletion, as well.  (Maybe this was
> > > > > assumed, but
> > > > > > > it would be good to mention it).
> > > > > > > > > I feel like there are a few sections that should be moved
> to
> > > > > "rejected
> > > > > > > alternatives."  For example, in the DeleteTopics section, since
> > we're
> > > > > not
> > > > > > > going to do option 1 or 2, these should be moved into "rejected
> > > > > > > alternatives,"  rather than appearing inline.  Another case is
> > the
> > > > > "Should
> > > > > > > we remove topic name from the protocol where possible" section.
> > This
> > > > > is
> > > > > > > clearly discussing a design alternative that we're not
> proposing
> > to
> > > > > > > implement: removing the topic name from those protocols.
> > > > > > > > > Is it really necessary to have a new
> > /admin/delete_topics_by_id
> > > > > path
> > > > > > > in ZooKeeper?  It seems like we don't really need this.
> Whenever
> > > > > there is
> > > > > > > a new controller, we'll send out full LeaderAndIsrRequests
> which
> > will
> > > > > > > trigger the stale topics to be cleaned up.   The active
> > controller
> > > > will
> > > > > > > also send the full LeaderAndIsrRequest to brokers that are just
> > > > > starting
> > > > > > > up.    So we don't really need this kind of two-phase commit
> > (send
> > > > out
> > > > > > > StopReplicasRequest, get ACKs from all nodes, commit by
> removing
> > > > > > > /admin/delete_topics node) any more.
> > > > > > > > > You mention that FetchRequest will now include UUID to
> avoid
> > > > issues
> > > > > > > where requests are made to stale partitions.  However, adding a
> > UUID
> > > > to
> > > > > > > MetadataRequest is listed as future work, out of scope for this
> > KIP.
> > > > > How
> > > > > > > will the client learn what the topic UUID is, if the metadata
> > > > response
> > > > > > > doesn't include that information?  It seems like adding the
> UUID
> > to
> > > > > > > MetadataResponse would be an improvement here that might not be
> > too
> > > > > hard to
> > > > > > > make.
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> > > > > > > > > > Lucas, this would be great. I've run into issues with
> > topics
> > > > > being
> > > > > > > > > > resurrected accidentally, since a client cannot easily
> > > > > distinguish
> > > > > > > between
> > > > > > > > > > a deleted topic and a new topic with the same name. I'd
> > need
> > > > the
> > > > > ID
> > > > > > > > > > accessible from the client to solve that issue, but this
> > is a
> > > > > good
> > > > > > > first
> > > > > > > > > > step.
> > > > > > > > > >
> > > > > > > > > > Ryanne
> > > > > > > > > >
> > > > > > > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet <
> > > > > lu...@confluent.io>
> > > > > > > wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I would like to kick off discussion of KIP-516, an
> > > > > implementation
> > > > > > > of topic
> > > > > > > > > > > IDs for Kafka. Topic IDs aim to solve topic uniqueness
> > > > > problems in
> > > > > > > Kafka,
> > > > > > > > > > > where referring to a topic by name alone is
> insufficient.
> > > > Such
> > > > > > > cases
> > > > > > > > > > > include when a topic has been deleted and recreated
> with
> > the
> > > > > same
> > > > > > > name.
> > > > > > > > > > > Unique identifiers will help simplify and improve
> Kafka's
> > > > topic
> > > > > > > deletion
> > > > > > > > > > > process, as well as prevent cases where brokers may
> > > > incorrectly
> > > > > > > interact
> > > > > > > > > > > with stale versions of topics.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > > > > > > > Looking forward to your thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> >
> >
>

Reply via email to