Hey Colin,

Thanks for the hard work on this proposal.

I'm gradually coming over to the idea of the controllers having separate
IDs. One of the benefits is that it allows us to separate the notion of
controller liveness from broker liveness, which has always been a tricky
detail. I think it's fair to say that the liveness of the controller is
tracked through the Raft protocol while the liveness of the broker is
tracked by the heartbeating mechanism in this KIP. This saves from having
to deal with tricky cases like electing a controller which is not actually
live from the perspective of heartbeating. I suspect you are right that it
will be simpler internally if we treat them distinctly, though it will take
some getting used to for users. It also seemingly makes it easier to
migrate to a dedicated controller setup once a cluster gets large enough
for it.

With that said, one detail which is not very clear to me is how these two
IDs interact with the metadata quorum. Let's say we have a broker which is
running as both a broker and a controller. I think it must send fetches
using `controller.id` so that the leader can track its progress and it can
be eligible for election. Will it also fetch using `broker.id`? If we could
avoid replicating the metadata twice, that would be preferable, but it
seems like that will break the isolation between the controller and broker
at some level.

There is another option I was thinking about for the sake of discussion.
Suppose we say that controllers always persist the metadata log and brokers
never do. A broker would always start from the latest snapshot, but we can
allow it to fetch from the "nearest" controller (which might be local if
`process.roles` is set to both controller and broker). If users want to
have the metadata log replicated to all nodes, then they can make each node
a controller. It's fine to have controllers that are not voters since they
can be observers. They replicate and persist the metadata log, but they
do not take part in elections, though they would be available for observer
promotion when we get around to completing the raft reassignment work.

On a related note, I'm still slightly in favor of unifying the controller
listener into the existing `listeners` configuration. I think we would
agree that separating the controller listener should at least be considered
a best practice in a secure configuration, but I am not sure about the case
for mandating it. I'm sympathetic to the idea that we should be opinionated
about this, but it would be helpful if the KIP documented what exactly we
are getting out of it. My concern is basically that it hurts usability.

By the way, in KIP-595 we tended to favor `quorum` as the prefix for
configurations related to the management of the metadata quorum. None of
these configs have been exposed yet, so we can still change them. I think
`controller` is probably more descriptive, so if you're in agreement, we
can amend KIP-595 so that it uses the `controller` prefix. However, I think
it's important to redefine `controller.connect` so that it is an explicit
definition of the set of metadata voters. Right now it reads more like a
bootstrap url. Since we decided to take dynamic quorum configuration out of
the initial version, we need the voter set to be defined explicitly in
static configuration. We used `quorum.voters` in KIP-595.

Best,
Jason

On Tue, Sep 29, 2020 at 5:21 PM Jose Garcia Sancio <jsan...@confluent.io>
wrote:

> Hi Jun and Colin,
>
> Some comments below.
>
> > 62.3 We added some configs in KIP-595 prefixed with "quorum" and we plan
> to
> > add some controller specific configs prefixed with "controller". KIP-630
> > plans to add some other controller specific configs with no prefix.
> Should
> > we standardize all controller specific configs with the same prefix?
>
> I agree that consistency in all of the new properties is really
> important to improve the user's experience with Kafka. After some
> discussion in the review of KIP-630, the configuration names start
> with "metadata".
>
> metadata.snapshot.min.cleanable.ratio
> metadata.lbo.lag.time.max.ms
>
> The reason for the change is that this configuration affects both the
> controller component and the metadata cache component as users of the
> "cluster metadata" topic partition. I think the new names matches
> Kafka's existing configuration pattern for the transaction and
> consumer offset topic partitions.
>
> Thanks!
> --
> -Jose
>

Reply via email to