Re: [DISCUSS] KIP-78: Cluster Id

Dong Lin Mon, 05 Sep 2016 11:22:42 -0700

Ismael,

I think you are saying that we can stop our discussion and follow simply
take a vote where the majority decides. I don't think this is a good way to
find the best design for a KIP and the discussion seems to be useless. It
doesn't seem like anyone else is interested to join this discussion other
than you, Sumit and I. I will leave it as it is after summarizing our
discussion.


There are 27 emails in this email thread and I think most people wouldn't
bother to read all of them. So, instead of replying to your comments
inline, I will summarize our discussion here. All of the information below,
including my complete description of the alternative, can be found in the
previous emails. I will omit the discussion of other minor stuff --
interested reader can read our prior emails.

This KIP suggests to use randomly generated cluster.id. I am suggesting
that we provide the option for user to set cluster.id in broker config. If
user doesn't explicitly provide value for cluster.id in config or if the
config value is an empty string, then broker can use randomly generated
cluster.id. Otherwise, the broker use the cluster.id from the config.

My argument for the 2nd approach is that it additionally allows user to use
human readable cluster.id if they want to do it. It keeps the benefits of
the existing approach. For companies with only a few Kafka clusters, they
can choose to use readable cluster.id. For companies with too many
clusters, they can choose not to config cluster.id so that random cluster.id
will be generated.

There are two major concern with this KIP:

- The KIP says that it is a requirement for cluster.id to be immutable.
Ismael suggests that the reading cluster.id from config doesn't meet this
requirement. However, the current approach described in the KIP doesn't
satisfy this requirement either. If user deletes the znode that stores
cluster.id, either intentionally or by mistake, the cluster.id is lost
forever. The KIP should be updated to reflect this and this requirement can
not be used in the comparison between the different approaches.

- One of the argument against reading cluster.id from config is that "unique
and immutable auto-generated id + changeable human-readable name is a
better overall solution". Sumit describes the long term plan to use
readable tags as well. However, the KIP doesn't describe the design of
using this readable name/tags. The KIP needs to provide more information
about this plan if this is used to argue for the existing approach against
an alternative.

Dong



On Mon, Sep 5, 2016 at 2:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Dong,
>
> Sumit responded to a number of points already, so I will try to be brief.
> See inline.
>
> Also, it may just be possible that we won't reach agreement. In that case,
> a vote may be a way to figure out if people feel that this proposal adds
> value in its current form or not.
>
> On Mon, Sep 5, 2016 at 12:54 AM, Dong Lin <lindon...@gmail.com> wrote:
>
> > I don't think have a human-readable name is equivalent to a meaningful
> > name. It is not true that a human readable name makes it more likely you
> > want to change it. Look, every city has a human readable name and we
> don't
> > worry about changing its name. The conference room in any company has a
> > human readable name instead of a random id. For the same reason you can
> > name a cluster as Yosemite and don't have to change it in the future.
> >
>
> As Sumit said, many cities have in fact changed their names. Incidentally,
> all the conference names at Confluent were recently renamed. So, this
> illustrates the point well. Yes, it is possible to give human-readable, but
> not meaningful names. I still think that unique and immutable
> auto-generated id + changeable human-readable name is a better overall
> solution.
>
> By immutable I think you are saying that we should prevent people from
> > changing cluster.id. However, this KIP doesn't really prevent this from
> > happening -- user can delete znode and restart kafka to change
> cluster.id.
> > Therefore the requirement is not satisfied anyway.
> >
>
> Sure, we can't prevent users from deleting state in ZooKeeper or elsewhere
> if they have access to it. The idea is that users wouldn't need to with the
> auto-generated id.
>
> I am also not sure why you want to prevent people from changing cluster.id
> > after reading the motivation section of this KIP. Is there any motivation
> > or use-case for this requirement?
> >
>
> I thought I explained this a few times. :) Sumit took a stab as well. The
> requirement is to reliably associate a message with a cluster. Each time
> the cluster id changes, you are basically "creating" a new cluster so it
> would look like messages are associated with 2 different clusters instead
> of a single one. This is an old database topic, of course: surrogate versus
> natural keys.
>
> It is not clear why it will make downstream code would be more complex and
> > feature less useful if we provide a default cluster.id here. For users
> who
> > are not interested in this feature, they can use the cluster.id and all
> > downstream application will not be affected. For users who need this
> > feature, they can configure a unique human readable cluster.id for their
> > clusters. In this case the downstream application will have the same
> > complexity as with the approach in this KIP. Did I miss something?
> >
>
> Can you please clarify what you mean by "default cluster.id"? I don't
> follow what you're saying in the comment above.
>
> Right, there is no easy way to detect this automatically with Kafka. But
> > this is not a requirement to automatically detect violation of uniqueness
> > in the first place. SRE can manually make sure that the unique
> cluster.id
> > is given to each cluster in the broker config.
>
>
> We would like the feature to be useful across the board. Not all teams have
> a super capable team of SREs like LinkedIn. Some may not even have SREs at
> all. :)
>
> I am not sure if it is weird. We can seek the view from other SRE and
> > developer to understand whether and why it is weird. I can ask our SRE to
> > comment as well. It is hard to evaluate whether "weirdness" outweighs the
> > benefits from the ability to identify cluster with a human readable
> > cluster.id without knowing its impact on the use-case and user
> experience.
> >
>
> It seems that a few things are being conflated here. You can set the
> cluster id manually in either proposal. The main differences are:
>
> 1. Whether the cluster id is auto-generated if not present (the KIP
> proposes auto-generation and, if I understand correctly, you are suggesting
> that it should not)
> 2. How the cluster id can be set manually (you'd have to set the relevant
> znode value with the KIP proposal whereas you are suggesting that it should
> be possible via a broker config)
> 3. The recommended workflow (the KIP suggests that you should just rely on
> the auto-generated id whereas you are suggesting that setting the value
> manually is a good idea).
>
>
> > Hmm.. you and Sumit provided two completely difference requirement
> > regarding immutability and easiness of change. I share similar view with
> > Sumit on this issue. Of course we prefer to avoid changing the config.
> But
> > the one-time config change is probably not a big deal as compared to the
> > long-term benefit that comes with human readable in the
> monitoring/auditing
> > use-case.
> >
>
> As Sumit clarified, both of us are actually saying the same thing. I am
> quite confused when you say that you share a similar view with Sumit on
> this issue. :)
>
> Ismael
>

Re: [DISCUSS] KIP-78: Cluster Id

Reply via email to