Thanks for the discussion everyone, I'm going to move ahead with the
vote for this KIP.

-David

On Thu, Nov 3, 2022 at 1:20 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> Hi, David,
>
> Thanks for the reply.
>
> 20/21 Yes, but separate listeners are optional. It's possible for the nodes
> to use a single port for both client and server side communications.
>
> Thanks,
>
> Jun
>
> On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> <david.art...@confluent.io.invalid> wrote:
>
> > 20/21, in combined mode we still have a separate listener for the
> > controller APIs, e.g.,
> >
> > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> >
> > inter.broker.listener.name=PLAINTEXT
> >
> > controller.listener.names=CONTROLLER
> >
> > advertised.listeners=PLAINTEXT://localhost:9092
> >
> >
> >
> > Clients still talk to the broker through the advertised listener, and only
> > brokers and other controllers will communicate over the controller
> > listener.
> >
> > 40. Sounds good, I updated the KIP
> >
> > Thanks!
> > David
> >
> >
> >
> > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao <j...@confluent.io.invalid> wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21. When KRaft runs in the combined mode, does a controller know
> > whether
> > > an ApiRequest is from a client or another broker?
> > >
> > > 40. Adding a "None" state sounds reasonable.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > <david.art...@confluent.io.invalid> wrote:
> > >
> > > > Jun,
> > > >
> > > > 20/21 If we use a tagged field, then I don't think clients need to be
> > > > concerned with it, right?. In ApiVersionsResponse sent by brokers to
> > > > clients, this field would be omitted. Clients can't connect directly to
> > > the
> > > > KRaft controller nodes. Also, we have a precedent of sending controller
> > > > node state between controllers through ApiVersions ("metadata.version"
> > > > feature), so I think it fits well.
> > > >
> > > > 40. For new KRaft clusters, we could add another state to indicate it
> > was
> > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > "kafka.controller:type=KafkaController,name=MigrationState" => "None" ?
> > > We
> > > > could also omit that metric for unmigrated clusters, but I'm not a fan
> > of
> > > > using the absence of a value to signal something.
> > > >
> > > > -----
> > > >
> > > > Akhilesh, thanks for reviewing the KIP!
> > > >
> > > > 1. MigrationState and MetadataType are mostly the same on the
> > controller,
> > > > but we do have the "MigratingZkData" state that seems useful to report
> > > as a
> > > > metric. Aside from looking at logs, observing the controller in this
> > > state
> > > > is the only way to see how long its taking to copy data from ZK.
> > > >
> > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > question
> > > > about non-migrated clusters. I think it's useful to have a distinct
> > > > MigrationState for clusters that have been migrated and those that were
> > > > never migrated. This does mean we'll report the MigrationState long
> > after
> > > > the migration is complete, but we can drop these metrics in 4.0 once ZK
> > > is
> > > > removed.
> > > >
> > > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> > some
> > > > way to indicate that the whole quorum is correctly configured to handle
> > > the
> > > > migration so we don't failover to a controller that's not configured
> > for
> > > > ZK. Did I understand your question correctly?
> > > >
> > > > 3. Yea, good idea. While the KRaft controller has
> > > > MigrationState=MigrationIneligible, we could also report
> > > >
> > >
> > "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > > > It might be useful to report ineligible controllers as well since that
> > > can
> > > > prevent the migration from starting.
> > > >
> > > > 4. I think I covered this in "Incompatible Brokers". We effectively
> > fence
> > > > these brokers by not sending them metadata RPCs.
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > >
> > > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti <
> > akhilesh....@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > >
> > > > > Thanks for the KIP. I have some questions/suggestions.
> > > > >
> > > > >
> > > > > 1) I see two new metrics:
> > > > > kafka.controller:type=KafkaController,name=MetadataType and
> > > > > kafka.controller:type=KafkaController,name=MigrationState. Won't the
> > > > second
> > > > > metric already cover the cases of the first metric? Also, instead of
> > > > > MigrationFinalized, we could directly say the state is KRaftMode. So
> > we
> > > > can
> > > > > use the same value for default KRaft clusters.
> > > > >
> > > > >
> > > > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft Controller. By
> > > > > default, we plan to start the Controller quorum in "
> > > > > *kafka.metadata.migration.enable*" config set to true. Then do we
> > need
> > > > this
> > > > > additional information again to make sure The controllers are ready
> > for
> > > > > migration? What would happen if the Controller assumes it is ready
> > for
> > > > > migration from 3.4 by default if it doesn't see both
> > MigrationMetadata
> > > > > records?
> > > > >
> > > > >
> > > > > 3) I see that we do not impose order on rolling the brokers with
> > > > migration
> > > > > flags and provisioning the controller quorum. Along with the KRaft
> > > > > controller emitting migration state metrics, it may be better to emit
> > > the
> > > > > broker count for the brokers not ready for migration yet. This will
> > > give
> > > > us
> > > > > more insight into any roll issues.
> > > > >
> > > > >
> > > > > 4) Once the KRaft controller is in migration mode, we should also
> > > > > prevent/handle ZkBrokerRegistrations that don't enable migration
> > mode.
> > > > >
> > > > >
> > > > > Thanks
> > > > > Akhilesh
> > > > >
> > > > >
> > > > > On Tue, Nov 1, 2022 at 10:49 AM Jun Rao <j...@confluent.io.invalid>
> > > > wrote:
> > > > >
> > > > > > Hi, David,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 20/21. Regarding the new ZkMigrationReady field in
> > > ApiVersionsResponse,
> > > > > it
> > > > > > seems that this is a bit intrusive since it exposes unneeded info
> > to
> > > > the
> > > > > > clients. Another option is to add that field as part of the Fetch
> > > > > request.
> > > > > > We can choose to only set that field in the very first Fetch
> > request
> > > > > from a
> > > > > > Quorum follower.
> > > > > >
> > > > > > 40. For kafka.controller:type=KafkaController,name=MigrationState,
> > > what
> > > > > is
> > > > > > the value for a brand new KRaft cluster?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Mon, Oct 31, 2022 at 2:35 PM David Arthur
> > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > >
> > > > > > > 30. I think we can keep the single ControllerId field in those
> > > > requests
> > > > > > > since they are only used for fencing (as far as I know).
> > > Internally,
> > > > > the
> > > > > > > broker components that handle those requests will compare the
> > > > > > ControllerId
> > > > > > > with that of MetadataCache (which is updated via UMR).
> > > > > > >
> > > > > > > The reason we need the separate KRaftControllerId in the
> > > > UpdateMetadata
> > > > > > > code path so that we can have different connection behavior for a
> > > > KRaft
> > > > > > > controller vs ZK controller.
> > > > > > >
> > > > > > > 31. It seems reasonable to keep the MigrationRecord in the
> > > snapshot.
> > > > I
> > > > > > was
> > > > > > > thinking the same thing in terms of understanding the loss for a
> > > > > > > migration-after-finalization. However, once a snapshot has been
> > > taken
> > > > > > that
> > > > > > > includes the final MigrationRecord, we can't easily see which
> > > records
> > > > > > came
> > > > > > > after it.
> > > > > > >
> > > > > > > 32. You're correct, we can just use the modify time from the
> > Stat.
> > > > The
> > > > > > > other two fields are primarily informational and are there for
> > > > > operators
> > > > > > > who want to inspect the state of the migration. They aren't
> > > required
> > > > > for
> > > > > > > correctness
> > > > > > >
> > > > > > > 33. Yes that's right. I detail that in "Controller Leadership"
> > > > section
> > > > > > >
> > > > > > > 34. Right, I'll fix that.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > > On Mon, Oct 31, 2022 at 2:55 PM Jun Rao <j...@confluent.io.invalid
> > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, David,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. A few more comments.
> > > > > > > >
> > > > > > > > 30. LeaderAndIsrRequest/StopReplicaRequest both have a
> > > controllerId
> > > > > > > field.
> > > > > > > > Should we add a KRaftControllerId field like UpdateMetadata?
> > > > > > > >
> > > > > > > > 31. "If a migration has been finalized, but the KRaft quroum
> > > comes
> > > > up
> > > > > > > with
> > > > > > > > kafka.metadata.migration.enable, we must not re-enter the
> > > migration
> > > > > > mode.
> > > > > > > > In this case, while replaying the log, the controller can see
> > the
> > > > > > second
> > > > > > > > MigrationRecord and know that the migration is finalized and
> > > should
> > > > > not
> > > > > > > be
> > > > > > > > resumed. " Hmm, do we want to keep the MigrationRecord in the
> > > > > snapshot
> > > > > > > and
> > > > > > > > the metadata log forever after migration is finalized? If not,
> > we
> > > > > can't
> > > > > > > > know for sure whether a migration has happened or not. Also, it
> > > > might
> > > > > > be
> > > > > > > > useful to support switching back to ZK mode after the migration
> > > is
> > > > > > > > finalized, with the understanding of potential metadata loss.
> > In
> > > > that
> > > > > > > case,
> > > > > > > > we could just trim all metadata log and recopy the ZK metadata
> > > > back.
> > > > > > > >
> > > > > > > > 32. The /migration node in ZK: Do we need last_update_time_ms
> > > since
> > > > > ZK
> > > > > > > > Stats already has an MTime? Also, how do we plan to use the
> > > > > > > > kraft_controller_id and kraft_controller_epoch fields?
> > > > > > > >
> > > > > > > > 33. Controller migration: We will force a write to the
> > > > "/controller"
> > > > > > and
> > > > > > > > "/controller_epoch" ZNodes before copying ZK data, right?
> > > > > > > >
> > > > > > > > 34. "Operator can remove the persistent "/controller" and
> > > > > > > > "/controller_epoch" nodes allowing for ZK controller election
> > to
> > > > take
> > > > > > > > place". I guess the operator only needs to remove the
> > /controller
> > > > > path?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Mon, Oct 31, 2022 at 7:17 AM David Arthur
> > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > >
> > > > > > > > > Happy Monday, everyone! I've updated the KIP with the
> > following
> > > > > > > changes:
> > > > > > > > >
> > > > > > > > > * Clarified MetadataType metric usages (broker vs controller)
> > > > > > > > > * Added ZkMigrationReady tagged field to ApiVersionsResponse
> > > (for
> > > > > use
> > > > > > > by
> > > > > > > > > KRaft controller quorum)
> > > > > > > > > * Added MigrationRecord with two states: Started and Finished
> > > > > > > > > * Documented ZK configs for KRaft controller
> > > > > > > > > * Simplified state machine description (internally, more
> > states
> > > > > will
> > > > > > > > exist,
> > > > > > > > > but only the four documented are interesting to operators)
> > > > > > > > > * Clarified some things in Controller Migration section
> > > > > > > > > * Removed KRaft -> ZK parts of Broker Registration
> > > > > > > > > * Added Misconfigurations section to Failure Modes
> > > > > > > > >
> > > > > > > > > Let me know if I've missed anything from the past two weeks
> > of
> > > > > > > > discussion.
> > > > > > > > >
> > > > > > > > > Thanks again to everyone who has reviewed this KIP so far!
> > > > > > > > >
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > > > On Fri, Oct 28, 2022 at 2:26 PM Jun Rao
> > > <j...@confluent.io.invalid
> > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, David,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply.
> > > > > > > > > >
> > > > > > > > > > 20/21. Sounds good.
> > > > > > > > > >
> > > > > > > > > > Could you update the doc with all the changes being
> > > discussed?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 28, 2022 at 10:11 AM David Arthur
> > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > > > Jun,
> > > > > > > > > > >
> > > > > > > > > > > 20/21. I was also wondering about a "migration" record.
> > In
> > > > > > addition
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > scenario you mentioned, we also need a way to prevent the
> > > > > cluster
> > > > > > > > from
> > > > > > > > > > > re-entering the dual write mode after the migration has
> > > been
> > > > > > > > > finalized. I
> > > > > > > > > > > could see this happening inadvertently via a change in
> > some
> > > > > > > > > configuration
> > > > > > > > > > > management system. How about we add a record that marks
> > the
> > > > > > > beginning
> > > > > > > > > and
> > > > > > > > > > > end of the dual-write mode. The first occurrence of the
> > > > record
> > > > > > > could
> > > > > > > > be
> > > > > > > > > > > included in the metadata transaction when we migrate data
> > > > from
> > > > > > ZK.
> > > > > > > > > > >
> > > > > > > > > > > With this, the active controller would decide whether to
> > > > enter
> > > > > > dual
> > > > > > > > > write
> > > > > > > > > > > mode, finalize the migration based, or fail based on:
> > > > > > > > > > >
> > > > > > > > > > > * Metadata log state
> > > > > > > > > > > * It's own configuration
> > > ("kafka.metadata.migration.enable",
> > > > > > > > > > > "zookeeper.connect", etc)
> > > > > > > > > > > * The other controllers configuration (via
> > > > ApiVersionsResponse)
> > > > > > > > > > >
> > > > > > > > > > > WDYT?
> > > > > > > > > > >
> > > > > > > > > > > 22. Since we will need the fencing anyways as a
> > safe-guard,
> > > > > then
> > > > > > I
> > > > > > > > > agree
> > > > > > > > > > > would could skip the registration of KRaft brokers in ZK
> > to
> > > > > > simply
> > > > > > > > > > things a
> > > > > > > > > > > bit.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > David
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Oct 27, 2022 at 5:11 PM Jun Rao
> > > > > <j...@confluent.io.invalid
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, David,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply.
> > > > > > > > > > > >
> > > > > > > > > > > > 20/21. Relying upon the presence of ZK configs to
> > > determine
> > > > > > > whether
> > > > > > > > > the
> > > > > > > > > > > > KRaft controller is in a dual write mode seems a bit
> > > error
> > > > > > prone.
> > > > > > > > If
> > > > > > > > > > > > someone accidentally adds a ZK configuration to a brand
> > > new
> > > > > > KRaft
> > > > > > > > > > > cluster,
> > > > > > > > > > > > ideally it shouldn't cause the controller to get into a
> > > > weird
> > > > > > > > state.
> > > > > > > > > > Have
> > > > > > > > > > > > we considered storing the migration state in a metadata
> > > > > record?
> > > > > > > > > > > >
> > > > > > > > > > > > 22. If we have the broker fencing logic, do we need to
> > > > write
> > > > > > the
> > > > > > > > > broker
> > > > > > > > > > > > registration path in ZK for KRaft brokers at all?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > 20/21. A KRaft controller will recover the migration
> > > > state
> > > > > by
> > > > > > > > > reading
> > > > > > > > > > > the
> > > > > > > > > > > > > "/migration" ZNode. If the migration enable config is
> > > > set,
> > > > > > and
> > > > > > > > the
> > > > > > > > > ZK
> > > > > > > > > > > > > migration is complete, it will enter the dual-write
> > > mode.
> > > > > > > Before
> > > > > > > > an
> > > > > > > > > > > > > operator can decommission ZK, they will need to
> > > finalize
> > > > > the
> > > > > > > > > > migration
> > > > > > > > > > > > > which involves removing the migration config and the
> > ZK
> > > > > > config.
> > > > > > > > > I'll
> > > > > > > > > > > > > clarify this in the KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 22. Yea, we could see an incorrect broker ID during
> > > that
> > > > > > > window.
> > > > > > > > > If
> > > > > > > > > > we
> > > > > > > > > > > > > ended up with a state where we saw a ZK broker ID
> > that
> > > > > > > conflicted
> > > > > > > > > > with
> > > > > > > > > > > a
> > > > > > > > > > > > > KRaft broker ID, we would need to fence one of them.
> > I
> > > > > would
> > > > > > > > > probably
> > > > > > > > > > > opt
> > > > > > > > > > > > > to fence the KRaft broker in that case since broker
> > > > > > > registration
> > > > > > > > > and
> > > > > > > > > > > > > fencing is more robust in KRaft. Hopefully this is a
> > > rare
> > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 26. Sounds good.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao
> > > > > > > <j...@confluent.io.invalid
> > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, David,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the reply. A few more comments.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 20/21. Using a tagged field in ApiVersionRequest
> > > could
> > > > > > work.
> > > > > > > > > > Related
> > > > > > > > > > > to
> > > > > > > > > > > > > > this, how does a KRaft controller know that it's in
> > > the
> > > > > > dual
> > > > > > > > > write
> > > > > > > > > > > > mode?
> > > > > > > > > > > > > > Does it need to read the /controller path from ZK?
> > > > After
> > > > > > the
> > > > > > > > > > > migration,
> > > > > > > > > > > > > > people may have the ZK cluster decommissioned, but
> > > > still
> > > > > > have
> > > > > > > > the
> > > > > > > > > > ZK
> > > > > > > > > > > > > > configs left in the KRaft controller. Will this
> > cause
> > > > the
> > > > > > > KRaft
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > to be stuck because it doesn't know which mode it
> > is
> > > > in?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 22. Using the ephemeral node matches the current
> > > > ZK-based
> > > > > > > > broker
> > > > > > > > > > > > behavior
> > > > > > > > > > > > > > better. However, it leaves a window for incorrect
> > > > broker
> > > > > > > > > > registration
> > > > > > > > > > > > to
> > > > > > > > > > > > > > sneak in during KRaft controller failover.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 26. Then, we could just remove Broker Registration
> > in
> > > > > that
> > > > > > > > > section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 20/21 It could definitely cause problems if we
> > > > failover
> > > > > > to
> > > > > > > a
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > > without "kafka.metadata.migration.enable". The
> > only
> > > > > > > > mechanism I
> > > > > > > > > > > know
> > > > > > > > > > > > of
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > controllers to learn things about one another is
> > > > > > > ApiVersions.
> > > > > > > > > We
> > > > > > > > > > > > > > currently
> > > > > > > > > > > > > > > use this for checking support for
> > > "metadata.version"
> > > > > (in
> > > > > > > > KRaft
> > > > > > > > > > > mode).
> > > > > > > > > > > > > We
> > > > > > > > > > > > > > > could add a "zk.migration" feature flag that's
> > > > enabled
> > > > > > on a
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > if the config is set. Another possibility would
> > be
> > > a
> > > > > > tagged
> > > > > > > > > field
> > > > > > > > > > > on
> > > > > > > > > > > > > > > ApiVersionResponse that indicated if the config
> > was
> > > > > set.
> > > > > > > Both
> > > > > > > > > > seem
> > > > > > > > > > > > > > somewhat
> > > > > > > > > > > > > > > inelegant. I think a tagged field would be a bit
> > > > > simpler
> > > > > > > (and
> > > > > > > > > > > > arguably
> > > > > > > > > > > > > > less
> > > > > > > > > > > > > > > hacky).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For 20, we could avoid entering the migration
> > state
> > > > > > unless
> > > > > > > > the
> > > > > > > > > > > whole
> > > > > > > > > > > > > > quorum
> > > > > > > > > > > > > > > had the field present in their NodeApiVersions.
> > For
> > > > 21,
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > leaving the migration state unless the whole
> > quorum
> > > > did
> > > > > > not
> > > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > in their NodeApiVersions. Do you think this would
> > > be
> > > > > > > > > sufficient?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 22. Right, we need to write the broker info back
> > to
> > > > ZK
> > > > > > just
> > > > > > > > as
> > > > > > > > > a
> > > > > > > > > > > > > > safeguard
> > > > > > > > > > > > > > > against incorrect broker IDs getting registered
> > > into
> > > > > ZK.
> > > > > > I
> > > > > > > > was
> > > > > > > > > > > > thinking
> > > > > > > > > > > > > > > these would be persistent nodes, but it's
> > probably
> > > > fine
> > > > > > to
> > > > > > > > make
> > > > > > > > > > > them
> > > > > > > > > > > > > > > ephemeral and have the active KRaft controller
> > keep
> > > > > them
> > > > > > up
> > > > > > > > to
> > > > > > > > > > > date.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 23. Right. When the broker comes up as a KRaft
> > > > broker,
> > > > > > it's
> > > > > > > > old
> > > > > > > > > > > > > > > /brokers/ids ZNode will be gone and it will
> > > register
> > > > > > itself
> > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > controller. The controller will know that it is
> > now
> > > > in
> > > > > > > KRaft
> > > > > > > > > mode
> > > > > > > > > > > and
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > stop sending it the old RPCs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 24. Ok, I'll add these
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 25. I realize now the "/controller_epoch" node is
> > > > > already
> > > > > > > > > > > persistent.
> > > > > > > > > > > > > It
> > > > > > > > > > > > > > > should be sufficient to remove the "/controller"
> > > node
> > > > > to
> > > > > > > > > trigger
> > > > > > > > > > an
> > > > > > > > > > > > > > > election.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 26. Hmm, not sure, but I don't think it uses
> > > watches.
> > > > > > > > > KafkaServer
> > > > > > > > > > > > > > registers
> > > > > > > > > > > > > > > the broker info and later loads all brokers in
> > the
> > > > > > cluster
> > > > > > > to
> > > > > > > > > > check
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > the endpoints don't conflict. Things that do use
> > > > > watches
> > > > > > > are
> > > > > > > > > > > dynamic
> > > > > > > > > > > > > > > configs, ACLs, and some others (listed in the
> > KIP).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 4:26 PM David Arthur <
> > > > > > > > > > > > > david.art...@confluent.io>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Luke and Andrew, thanks for taking a look!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think the names of the state machine in the
> > KIP
> > > > > > aren't
> > > > > > > > the
> > > > > > > > > > > best.
> > > > > > > > > > > > > I'll
> > > > > > > > > > > > > > > > try to improve that section in general.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. "MigrationReady" is probably better named
> > > > > > > > > "MigratingFromZk"
> > > > > > > > > > or
> > > > > > > > > > > > > > > > something. It's meant to be the state when the
> > > > KRaft
> > > > > > > > > controller
> > > > > > > > > > > is
> > > > > > > > > > > > > > > actively
> > > > > > > > > > > > > > > > migrating the data out of ZK. This happens
> > after
> > > we
> > > > > > have
> > > > > > > > > > detected
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > cluster is eligible and before we enter the
> > dual
> > > > > write
> > > > > > > > mode.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. "MigrationActive" should probably be called
> > > > > > > > > > "BrokerMigration".
> > > > > > > > > > > > > > > > Transitioning to "MigrationFinished" can happen
> > > > > > > > automatically
> > > > > > > > > > > when
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > known ZK brokers have been migrated. Since we
> > > don't
> > > > > > have
> > > > > > > > > > > permanent
> > > > > > > > > > > > > > > > registrations of ZK brokers, we can use the
> > > > partition
> > > > > > > > > > assignments
> > > > > > > > > > > > as
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > proxy for this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. A metric for unready brokers makes sense. We
> > > can
> > > > > > also
> > > > > > > > log
> > > > > > > > > a
> > > > > > > > > > > > > message
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > the controller when it tries to start the
> > > migration
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. The MetadataType metric is meant to help see
> > > > this.
> > > > > > > E.g.,
> > > > > > > > > > some
> > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > would have MetadataType=ZK, some would have
> > > > > > > > MetadataType=Dual
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 5. On ZK brokers, not having this config set
> > > would
> > > > > > > prevent
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > registration data from being written to ZK.
> > This
> > > > > means
> > > > > > > the
> > > > > > > > > > KRaft
> > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > won't send RPCs to it. If a broker has been
> > > > migrated
> > > > > to
> > > > > > > > KRaft
> > > > > > > > > > > > > already,
> > > > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > > not sure there is any harm in removing this
> > > config.
> > > > > If
> > > > > > we
> > > > > > > > > > decide
> > > > > > > > > > > > that
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > need to guarantee that KRaft brokers have this
> > > > config
> > > > > > set
> > > > > > > > > > during
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > migration, we can include it in the broker
> > > > > registration
> > > > > > > > > that's
> > > > > > > > > > > sent
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > KRaft. This would let the controller keep that
> > > > broker
> > > > > > > > fenced.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 6. Once the KRaft controller takes leadership,
> > > the
> > > > ZK
> > > > > > > > > > controller
> > > > > > > > > > > > > won't
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > active any more and will stop reporting the
> > > > > > > MigrationState
> > > > > > > > > > > metric.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 7. The "Dual" MetadataType is reported by
> > brokers
> > > > > > running
> > > > > > > > in
> > > > > > > > > > > KRaft
> > > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > when the migration enable config is set. I
> > think
> > > > the
> > > > > > > > > controller
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > report this, not just the brokers. I'll clarify
> > > in
> > > > > the
> > > > > > > KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Andrew:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > How will the code get all the ZooKeeper
> > config?
> > > > > Will
> > > > > > it
> > > > > > > > do
> > > > > > > > > > some
> > > > > > > > > > > > > sort
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > scan of the ZooKeeper data store? ... Do we
> > need
> > > to
> > > > > do
> > > > > > > > > anything
> > > > > > > > > > > > > special
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > make sure we get all data from ZooKeeper?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > With a few exceptions, all data written to ZK
> > by
> > > > > Kafka
> > > > > > > > > happens
> > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > controller (single writer principle). In our
> > > > > migration
> > > > > > > > code,
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > > > > unconditionally update the "/controller" and
> > > > > > > > > > "/controller_epoch"
> > > > > > > > > > > > > ZNodes
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > effectively force the migration component to
> > gain
> > > > > > > > leadership
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > controller. Once this happens, we don't expect
> > > any
> > > > > data
> > > > > > > to
> > > > > > > > be
> > > > > > > > > > > > written
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > ZK, so we can read it iteratively without
> > > worrying
> > > > > > about
> > > > > > > > any
> > > > > > > > > > > > > concurrent
> > > > > > > > > > > > > > > > writes.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > As for the linearizable thing, I believe this
> > > just
> > > > > > means
> > > > > > > > that
> > > > > > > > > > > reads
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > served by a quorum follower which has stale
> > data.
> > > > > We'll
> > > > > > > be
> > > > > > > > > > > reading
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > the same way the ZK controller does, so I think
> > > we
> > > > > will
> > > > > > > be
> > > > > > > > > fine
> > > > > > > > > > > > > > regarding
> > > > > > > > > > > > > > > > consistency. If we wanted to be extra careful,
> > we
> > > > > could
> > > > > > > > add a
> > > > > > > > > > > delay
> > > > > > > > > > > > > > prior
> > > > > > > > > > > > > > > > to iterating through the znodes to give
> > > partitioned
> > > > > ZK
> > > > > > > > > > followers
> > > > > > > > > > > a
> > > > > > > > > > > > > > chance
> > > > > > > > > > > > > > > > to get kicked out of the quorum. I don't think
> > > > we'll
> > > > > > need
> > > > > > > > > that
> > > > > > > > > > > > though
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > What happens if we commit the transaction
> > then
> > > > fail
> > > > > > > right
> > > > > > > > > > after
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > restart?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If we commit the "migration" transaction to the
> > > > > > metadata
> > > > > > > > log,
> > > > > > > > > > it
> > > > > > > > > > > > > won't
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > a problem if we failover or restart. We can
> > > recover
> > > > > our
> > > > > > > > state
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > metadata log and what exists in ZK. If we see a
> > > > > > migration
> > > > > > > > > > > > transaction
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > the kraft log, but no "/migration" ZNode, we'll
> > > > know
> > > > > we
> > > > > > > > > failed
> > > > > > > > > > > > before
> > > > > > > > > > > > > > > > writing to ZK. If we see a partial transaction
> > in
> > > > the
> > > > > > > log,
> > > > > > > > > then
> > > > > > > > > > > we
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > abort it and restart the migration.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This sort of leads to me wondering if we
> > > > > will/should
> > > > > > > > check
> > > > > > > > > > the
> > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > log state before doing the migration? That
> > could
> > > > also
> > > > > > > catch
> > > > > > > > > > > > mistakes
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > as if a KRaft quorum is used that already had
> > > some
> > > > > > > metadata
> > > > > > > > > > > which I
> > > > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > > we want to prevent.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Yes, we should check that the metadata log is
> > > empty
> > > > > > > before
> > > > > > > > > > > > > attempting a
> > > > > > > > > > > > > > > > migration (perhaps excepting the bootstrap
> > > metadata
> > > > > --
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > > think
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > that one still).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Oct 24, 2022 at 8:57 AM Andrew Grant
> > > > > > > > > > > > > > <agr...@confluent.io.invalid
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> Hey David,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Thanks for the KIP. I had a few small
> > questions.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> “The ZK data migration will copy the existing
> > ZK
> > > > > data
> > > > > > > into
> > > > > > > > > the
> > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> metadata log and establish the new KRaft
> > active
> > > > > > > controller
> > > > > > > > > as
> > > > > > > > > > > the
> > > > > > > > > > > > > > active
> > > > > > > > > > > > > > > >> controller from a ZK perspective.”
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> How will the code get all the ZooKeeper
> > config?
> > > > Will
> > > > > > it
> > > > > > > do
> > > > > > > > > > some
> > > > > > > > > > > > sort
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > >> scan of the ZooKeeper data store? Also I’m
> > not a
> > > > > > > ZooKeeper
> > > > > > > > > > > expert
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > per
> > > > > > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > https://zookeeper.apache.org/doc/current/zookeeperInternals.html
> > > > > > > > > > > > > > “Read
> > > > > > > > > > > > > > > >> operations in ZooKeeper are *not linearizable*
> > > > since
> > > > > > > they
> > > > > > > > > can
> > > > > > > > > > > > return
> > > > > > > > > > > > > > > >> potentially stale data.” Do we need to do
> > > anything
> > > > > > > special
> > > > > > > > > to
> > > > > > > > > > > make
> > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > >> get all data from ZooKeeper?
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> “For the initial migration, the controller
> > will
> > > > > > utilize
> > > > > > > > > > KIP-868
> > > > > > > > > > > > > > Metadata
> > > > > > > > > > > > > > > >> Transactions
> > > > > > > > > > > > > > > >> <
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> write all of the ZK metadata in a single
> > > > > transaction.
> > > > > > If
> > > > > > > > the
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > >> fails before this transaction is finalized,
> > the
> > > > next
> > > > > > > > active
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > >> will
> > > > > > > > > > > > > > > >> abort the transaction and restart the
> > migration
> > > > > > > process.”
> > > > > > > > > What
> > > > > > > > > > > > > happens
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > >> we commit the transaction then fail right
> > after
> > > > that
> > > > > > and
> > > > > > > > > > > restart?
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > >> sort
> > > > > > > > > > > > > > > >> of leads to me wondering if we will/should
> > check
> > > > the
> > > > > > > > > metadata
> > > > > > > > > > > log
> > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > >> before doing the migration? That could also
> > > catch
> > > > > > > mistakes
> > > > > > > > > > such
> > > > > > > > > > > as
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > >> KRaft quorum is used that already had some
> > > > metadata
> > > > > > > which
> > > > > > > > I
> > > > > > > > > > > assume
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > >> to prevent.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> For the Test Plan section do you think it’s
> > > worth
> > > > > > > calling
> > > > > > > > > out
> > > > > > > > > > > > > > > performance
> > > > > > > > > > > > > > > >> testing migrations of ZooKeeper deployments
> > that
> > > > > have
> > > > > > > > > “large”
> > > > > > > > > > > > > amounts
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > >> metadata?
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Andrew
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> On Mon, Oct 24, 2022 at 3:20 AM Luke Chen <
> > > > > > > > > show...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> > Hi David,
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Thanks for the good and complicated
> > proposal!
> > > :)
> > > > > > > > > > > > > > > >> > Some questions:
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 1. "MigrationReady" state: The KIP said:
> > > > > > > > > > > > > > > >> > The KRaft quorum has been started
> > > > > > > > > > > > > > > >> > I don't think we'll automatically enter this
> > > > state
> > > > > > > after
> > > > > > > > > > KRaft
> > > > > > > > > > > > > > quorum
> > > > > > > > > > > > > > > >> > started, do we?
> > > > > > > > > > > > > > > >> > I think KRaft active controller should take
> > > over
> > > > > > > > > leadership
> > > > > > > > > > by
> > > > > > > > > > > > > > > writing a
> > > > > > > > > > > > > > > >> > znode in /controller path before we entering
> > > > this
> > > > > > > sate,
> > > > > > > > is
> > > > > > > > > > > that
> > > > > > > > > > > > > > > correct?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 2. "MigrationActive" state: the KIP said:
> > > > > > > > > > > > > > > >> > ZK state has been migrated, controller is in
> > > > > > > dual-write
> > > > > > > > > > mode,
> > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > >> are
> > > > > > > > > > > > > > > >> > being restarted in KRaft mode
> > > > > > > > > > > > > > > >> > What confuses me is the last part: "brokers
> > > are
> > > > > > being
> > > > > > > > > > > restarted
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> > mode".
> > > > > > > > > > > > > > > >> > How could we detect brokers are being
> > > restarted
> > > > in
> > > > > > > KRaft
> > > > > > > > > > mode?
> > > > > > > > > > > > Old
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > >> > broker is removed and new KRaft broker is up
> > > > > within
> > > > > > N
> > > > > > > > > > minutes?
> > > > > > > > > > > > > > > >> > I think we don't have to rely on the
> > condition
> > > > > > > "brokers
> > > > > > > > > are
> > > > > > > > > > > > being
> > > > > > > > > > > > > > > >> restarted
> > > > > > > > > > > > > > > >> > in KRaft mode" to enter this state.
> > > > > > > > > > > > > > > >> > "brokers are being restarted in KRaft mode"
> > > > should
> > > > > > be
> > > > > > > a
> > > > > > > > > > > process
> > > > > > > > > > > > > > > happened
> > > > > > > > > > > > > > > >> > between "MigrationActive" and
> > > > "MigrationFinished".
> > > > > > > Does
> > > > > > > > > that
> > > > > > > > > > > > make
> > > > > > > > > > > > > > > sense?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 3. When "Zookeeper" mode trying to enter
> > > > > > > > > > "MigrationEligible",
> > > > > > > > > > > if
> > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > >> is a
> > > > > > > > > > > > > > > >> > cluster with tens of brokers, how could
> > users
> > > > know
> > > > > > why
> > > > > > > > > this
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > >> cannot
> > > > > > > > > > > > > > > >> > be in "MigrationEligible" state, yet? Check
> > > that
> > > > > > znode
> > > > > > > > > > > manually
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > >> one?
> > > > > > > > > > > > > > > >> > Or do we plan to create a tool to help them?
> > > Or
> > > > > > maybe
> > > > > > > > > expose
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >> "unready
> > > > > > > > > > > > > > > >> > ZK brokers" metrics?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 4. Same for "MigrationActive" entering
> > > > > > > > "MigrationFinished"
> > > > > > > > > > > > state.
> > > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > >> > that will be some process for restarting ZK
> > > > broker
> > > > > > > into
> > > > > > > > > > KRaft
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > >> one by
> > > > > > > > > > > > > > > >> > one, could we expose the "remaining ZK
> > > brokers"
> > > > as
> > > > > > > > > metrics?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 5. When users are in
> > > > > > > > > > > > > > > >>
> > > > > "MigrationReady"/"MigrationActive"/"MigrationFinished"
> > > > > > > > > > > > > > > >> > states, and they accidentally change the
> > KRaft
> > > > > > > > controller
> > > > > > > > > > > > config:
> > > > > > > > > > > > > > > >> > "kafka.metadata.migration.enable"
> > > > > > > > > > > > > > > >> > to false. What will happen to this cluster?
> > No
> > > > > > > > dual-write
> > > > > > > > > > for
> > > > > > > > > > > > it?
> > > > > > > > > > > > > > Will
> > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > >> > have any protection for it?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 6. About the "MigrationState" metric:
> > > > > > > > > > > > > > > >> > The "ZooKeeper" and "MigrationEligible" is
> > > > > reported
> > > > > > by
> > > > > > > > ZK
> > > > > > > > > > > > > > controller,
> > > > > > > > > > > > > > > >> and
> > > > > > > > > > > > > > > >> > the rest of states are reported by KRaft
> > > > > controller
> > > > > > > -->
> > > > > > > > > > makes
> > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > >> me.
> > > > > > > > > > > > > > > >> > One question from it is, when KRaft
> > controller
> > > > > takes
> > > > > > > > over
> > > > > > > > > > the
> > > > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > >> > from ZK controller, what will the
> > > > "MigrationState"
> > > > > > > value
> > > > > > > > > in
> > > > > > > > > > > old
> > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > >> > controller? keep in "MigrationEligible"
> > > doesn't
> > > > > make
> > > > > > > > > sense.
> > > > > > > > > > > Will
> > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > >> be a
> > > > > > > > > > > > > > > >> > empty or null state?
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > 7. About the "MetadataType" metric:
> > > > > > > > > > > > > > > >> > An enumeration of: ZooKeeper (1), Dual (2),
> > > > KRaft
> > > > > > (3).
> > > > > > > > > Each
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > >> reports
> > > > > > > > > > > > > > > >> > this.
> > > > > > > > > > > > > > > >> > I don't know how we could map the migration
> > > > state
> > > > > to
> > > > > > > > > these 3
> > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > >> > What is the metadataType when cluster in
> > > > > > > > "MigrationReady"
> > > > > > > > > > > state?
> > > > > > > > > > > > > > Still
> > > > > > > > > > > > > > > >> > Zookeeper?
> > > > > > > > > > > > > > > >> > When will brokers enter Dual type?
> > > > > > > > > > > > > > > >> > This is unclear in the KIP.
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Thank you.
> > > > > > > > > > > > > > > >> > Luke
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > On Thu, Oct 20, 2022 at 11:33 PM David
> > Arthur
> > > > > > > > > > > > > > > >> > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > > Igor, thanks for taking a look! Since JBOD
> > > in
> > > > > > KRaft
> > > > > > > is
> > > > > > > > > > still
> > > > > > > > > > > > > under
> > > > > > > > > > > > > > > >> > > discussion and not likely to land before
> > the
> > > > ZK
> > > > > > > > > > migration, I
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > >> we'll
> > > > > > > > > > > > > > > >> > > need to defer it. For migrating JBOD
> > > clusters
> > > > > from
> > > > > > > ZK
> > > > > > > > to
> > > > > > > > > > > > KRaft,
> > > > > > > > > > > > > > > we'll
> > > > > > > > > > > > > > > >> > also
> > > > > > > > > > > > > > > >> > > need to address the log dir failure
> > > mechanism
> > > > > > which
> > > > > > > > > > > currently
> > > > > > > > > > > > > > uses a
> > > > > > > > > > > > > > > >> > > special ZNode written to by the brokers.
> > > There
> > > > > is
> > > > > > an
> > > > > > > > old
> > > > > > > > > > KIP
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > >> > > which proposes a new RPC to move that ZK
> > > write
> > > > > to
> > > > > > > the
> > > > > > > > > > > > > controller,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > >> I'm
> > > > > > > > > > > > > > > >> > > not sure if that's the approach we'll want
> > > to
> > > > > > take.
> > > > > > > I
> > > > > > > > > read
> > > > > > > > > > > > > through
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > discussion over in KIP-858 and it sounds
> > > like
> > > > > > there
> > > > > > > > are
> > > > > > > > > > some
> > > > > > > > > > > > > good
> > > > > > > > > > > > > > > >> ideas
> > > > > > > > > > > > > > > >> > > there.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > To answer your question more directly,
> > > > migrating
> > > > > > ZK
> > > > > > > > > > clusters
> > > > > > > > > > > > > using
> > > > > > > > > > > > > > > >> JBOD
> > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > >> > > out of scope for this KIP. It might be
> > > > possible
> > > > > to
> > > > > > > > stop
> > > > > > > > > > > using
> > > > > > > > > > > > > JBOD
> > > > > > > > > > > > > > > >> using
> > > > > > > > > > > > > > > >> > > reassignments, but I'm not sure about
> > that.
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > -David
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > On Tue, Oct 18, 2022 at 12:17 PM Igor
> > > Soarez <
> > > > > > > > > i...@soarez.me
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > > Hi David,
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Thanks for the KIP, this is very
> > exciting!
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > How does JBOD relate to this work? KRaft
> > > > mode
> > > > > > > > doesn't
> > > > > > > > > > yet
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > >> > > > configuring brokers with multiple log
> > > > > > directories.
> > > > > > > > If
> > > > > > > > > > the
> > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > >> > > > existing cluster are configured with
> > > > multiple
> > > > > > log
> > > > > > > > > dirs,
> > > > > > > > > > > does
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > migration
> > > > > > > > > > > > > > > >> > > > imply that the existing brokers need to
> > > drop
> > > > > use
> > > > > > > of
> > > > > > > > > that
> > > > > > > > > > > > > > feature?
> > > > > > > > > > > > > > > >> Or is
> > > > > > > > > > > > > > > >> > > > there some way to upgrade them later?
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > > >> > > > Igor
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > On Mon, Oct 17, 2022, at 10:07 PM, David
> > > > > Arthur
> > > > > > > > wrote:
> > > > > > > > > > > > > > > >> > > > > I've updated the KIP with the
> > following
> > > > > > changes
> > > > > > > > (the
> > > > > > > > > > > > > > confluence
> > > > > > > > > > > > > > > >> diff
> > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > >> > > > not
> > > > > > > > > > > > > > > >> > > > > helpful here since i rearranged some
> > > > things)
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > * Added ZooKeeperBlockingKRaftMillis
> > > > metric
> > > > > > > > > > > > > > > >> > > > > * Added section on new broker
> > > registration
> > > > > > JSON
> > > > > > > > > > > > > > > >> > > > > * Removed section on MigrationCheck
> > RPC
> > > > > > > > > > > > > > > >> > > > > * Added change to
> > UpdateMetadataRequest
> > > > > > > > > > > > > > > >> > > > > * Added section "Additional ZK Broker
> > > > > Configs"
> > > > > > > > > > (includes
> > > > > > > > > > > > > > configs
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> > > > connect
> > > > > > > > > > > > > > > >> > > > > to KRaft quorum)
> > > > > > > > > > > > > > > >> > > > > * Added section on "Incompatible
> > > Brokers"
> > > > > > under
> > > > > > > > > > Failure
> > > > > > > > > > > > > Modes
> > > > > > > > > > > > > > > >> > > > > * Clarified many things per this
> > > > discussion
> > > > > > > thread
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > I realized we need the KRaft
> > controller
> > > to
> > > > > > pick
> > > > > > > > the
> > > > > > > > > > > > correct
> > > > > > > > > > > > > > > >> > > > > "metadata.version" when initializing
> > the
> > > > > > > > migration.
> > > > > > > > > I
> > > > > > > > > > > > > included
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > IBP
> > > > > > > > > > > > > > > >> > > > of a
> > > > > > > > > > > > > > > >> > > > > broker in its registration data so the
> > > > KRaft
> > > > > > > > > > controller
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > verify
> > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > >> > > > IBP
> > > > > > > > > > > > > > > >> > > > > and pick the correct
> > "metadata.version"
> > > > when
> > > > > > > > > starting
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >> migration.
> > > > > > > > > > > > > > > >> > > > > Otherwise, we could inadvertently
> > > > downgrade
> > > > > > the
> > > > > > > > > > > > > > > >> IBP/metadata.version
> > > > > > > > > > > > > > > >> > as
> > > > > > > > > > > > > > > >> > > > > part of the migration.
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > I also added "clusterId" to the broker
> > > > > > > > registration
> > > > > > > > > > data
> > > > > > > > > > > > so
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> KRaft
> > > > > > > > > > > > > > > >> > > > could
> > > > > > > > > > > > > > > >> > > > > verify it.
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > -David
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > On Mon, Oct 17, 2022 at 12:14 PM Colin
> > > > > McCabe
> > > > > > <
> > > > > > > > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > >> On Fri, Oct 14, 2022, at 11:39, Jun
> > Rao
> > > > > > wrote:
> > > > > > > > > > > > > > > >> > > > >> > Hi, Colin,
> > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > >> > > > >> > 10. "That all goes away in the new
> > > > mode,
> > > > > > and
> > > > > > > we
> > > > > > > > > > just
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > >> > code
> > > > > > > > > > > > > > > >> > > > which
> > > > > > > > > > > > > > > >> > > > >> > analyzes __cluster_metadata and
> > > > reflects
> > > > > it
> > > > > > > in
> > > > > > > > 1)
> > > > > > > > > > > > updates
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > >> > and
> > > > > > > > > > > > > > > >> > > 2)
> > > > > > > > > > > > > > > >> > > > >> > messages sent out to brokers."
> > > > > > > > > > > > > > > >> > > > >> > Hmm, I am not sure it's that
> > simple.
> > > > Some
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > > complexity
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > >> > > > >> ZK-based
> > > > > > > > > > > > > > > >> > > > >> > controller are (1) using the
> > > pipelining
> > > > > > > > approach
> > > > > > > > > to
> > > > > > > > > > > > write
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > >> > for
> > > > > > > > > > > > > > > >> > > > >> better
> > > > > > > > > > > > > > > >> > > > >> > throughput and using conditional
> > > writes
> > > > > for
> > > > > > > > > > > > correctness;
> > > > > > > > > > > > > > (2)
> > > > > > > > > > > > > > > >> > sending
> > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > >> > > > >> > proper LeaderAndIsr and
> > > UpdateMetadata
> > > > > > > > requests.
> > > > > > > > > > For
> > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > > > >> > > > >> > controller failover, the full
> > > metadata
> > > > > > needs
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > sent
> > > > > > > > > > > > > > while
> > > > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > > > >> > > > >> > individual broker failure, only
> > some
> > > of
> > > > > the
> > > > > > > > > > metadata
> > > > > > > > > > > > > needs
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > >> be
> > > > > > > > > > > > > > > >> > > > updated.
> > > > > > > > > > > > > > > >> > > > >> > The controlled shutdown handling
> > > > > sometimes
> > > > > > > uses
> > > > > > > > > > > > > > > >> StopReplicaRequest
> > > > > > > > > > > > > > > >> > > > and
> > > > > > > > > > > > > > > >> > > > >> > some other times uses
> > > > > LeaderAndIsrRequest.
> > > > > > > (3)
> > > > > > > > > > > > triggering
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > >> > events
> > > > > > > > > > > > > > > >> > > > >> based
> > > > > > > > > > > > > > > >> > > > >> > on the responses of LeaderAndIsr
> > > (e.g.
> > > > > for
> > > > > > > > topic
> > > > > > > > > > > > > deletion).
> > > > > > > > > > > > > > > >> Some
> > > > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > > > >> > > > those
> > > > > > > > > > > > > > > >> > > > >> > complexity could be re-implemented
> > > in a
> > > > > > more
> > > > > > > > > > > efficient
> > > > > > > > > > > > > way,
> > > > > > > > > > > > > > > >> but we
> > > > > > > > > > > > > > > >> > > > need
> > > > > > > > > > > > > > > >> > > > >> to
> > > > > > > > > > > > > > > >> > > > >> > be really careful not to generate
> > > > > > regression.
> > > > > > > > > Some
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > >> > > > >> complexity
> > > > > > > > > > > > > > > >> > > > >> > just won't go away. Reimplementing
> > > all
> > > > > > those
> > > > > > > > > logic
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > 30
> > > > > > > > > > > > > > > >> or
> > > > > > > > > > > > > > > >> > so
> > > > > > > > > > > > > > > >> > > > >> events
> > > > > > > > > > > > > > > >> > > > >> > in the ZK-based controller is
> > > possible,
> > > > > but
> > > > > > > > > seems a
> > > > > > > > > > > bit
> > > > > > > > > > > > > > > >> daunting
> > > > > > > > > > > > > > > >> > and
> > > > > > > > > > > > > > > >> > > > >> risky.
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> Hi Jun,
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> Thanks for the response.  I agree
> > that
> > > > > there
> > > > > > is
> > > > > > > > > some
> > > > > > > > > > > work
> > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > >> but I
> > > > > > > > > > > > > > > >> > > > don't
> > > > > > > > > > > > > > > >> > > > >> think it's as bad as it might seem.
> > > Most
> > > > of
> > > > > > the
> > > > > > > > > code
> > > > > > > > > > > for
> > > > > > > > > > > > > > > writing
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> > ZK
> > > > > > > > > > > > > > > >> > > > can
> > > > > > > > > > > > > > > >> > > > >> be reused from the old controller. We
> > > > > should
> > > > > > at
> > > > > > > > > least
> > > > > > > > > > > > > > evaluate
> > > > > > > > > > > > > > > >> using
> > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > >> > > > >> old ControllerChannelManager as
> > well. I
> > > > > don't
> > > > > > > > know
> > > > > > > > > if
> > > > > > > > > > > > we'll
> > > > > > > > > > > > > > end
> > > > > > > > > > > > > > > >> up
> > > > > > > > > > > > > > > >> > > > doing it
> > > > > > > > > > > > > > > >> > > > >> or not but it's a possibility. (The
> > > main
> > > > > > reason
> > > > > > > > to
> > > > > > > > > > not
> > > > > > > > > > > do
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > >> that
> > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > >> > > > >> response handling will be a bit
> > > > different)
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> The question of what to do during a
> > > > > > controller
> > > > > > > > > > failover
> > > > > > > > > > > > is
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > >> > > > interesting
> > > > > > > > > > > > > > > >> > > > >> one. Technically, we should be able
> > to
> > > > > > continue
> > > > > > > > > > sending
> > > > > > > > > > > > > > > >> incremental
> > > > > > > > > > > > > > > >> > > > updates
> > > > > > > > > > > > > > > >> > > > >> to the legacy nodes, for the same
> > > reason
> > > > we
> > > > > > can
> > > > > > > > in
> > > > > > > > > > > KRaft
> > > > > > > > > > > > > > mode.
> > > > > > > > > > > > > > > >> > > However,
> > > > > > > > > > > > > > > >> > > > >> probably we should just copy what ZK
> > > mode
> > > > > > does
> > > > > > > > and
> > > > > > > > > > send
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > >> full
> > > > > > > > > > > > > > > >> > > > >> metadata update. This will allow us
> > to
> > > > have
> > > > > > an
> > > > > > > > easy
> > > > > > > > > > way
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > handle
> > > > > > > > > > > > > > > >> > > > >> divergences caused by lost RPCs by
> > > > changing
> > > > > > the
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > > (just
> > > > > > > > > > > > > > > >> as
> > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > >> > > > do
> > > > > > > > > > > > > > > >> > > > >> in ZK mode, unfortunately). I suppose
> > > we
> > > > > > should
> > > > > > > > > > > document
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > >> > > > KIP...
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> I agree controlled shutdown is tricky
> > > to
> > > > > get
> > > > > > > just
> > > > > > > > > > > right.
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > >> suppose
> > > > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > > > >> > > > is
> > > > > > > > > > > > > > > >> > > > >> a case where the RPCs we send out are
> > > not
> > > > > > > purely
> > > > > > > > > > "fire
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > >> forget";
> > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > >> > > > have
> > > > > > > > > > > > > > > >> > > > >> to listen for the response. But that
> > > can
> > > > be
> > > > > > > done
> > > > > > > > in
> > > > > > > > > > an
> > > > > > > > > > > > > > > >> event-based
> > > > > > > > > > > > > > > >> > > way.
> > > > > > > > > > > > > > > >> > > > >> Controlled shutdown is also probably
> > > the
> > > > > last
> > > > > > > > thing
> > > > > > > > > > > we'll
> > > > > > > > > > > > > > > >> implement
> > > > > > > > > > > > > > > >> > > > once we
> > > > > > > > > > > > > > > >> > > > >> have the basic lift and shift.
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> Topic deletion is another annoying
> > > > thing. I
> > > > > > > > wonder
> > > > > > > > > if
> > > > > > > > > > > we
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > >> just
> > > > > > > > > > > > > > > >> > > > delete
> > > > > > > > > > > > > > > >> > > > >> immediately here and skip the
> > > complexity
> > > > of
> > > > > > > > > > > implementing
> > > > > > > > > > > > > > > >> "deleting
> > > > > > > > > > > > > > > >> > > > state."
> > > > > > > > > > > > > > > >> > > > >> Topic IDs will exist in IBP 3.4, in
> > > both
> > > > ZK
> > > > > > and
> > > > > > > > > KRaft
> > > > > > > > > > > > mode,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > >> > > > should be
> > > > > > > > > > > > > > > >> > > > >> OK, right?
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> best,
> > > > > > > > > > > > > > > >> > > > >> Colin
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > >> > > > >> > Thanks,
> > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > >> > > > >> > Jun
> > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > >> > > > >> > On Fri, Oct 14, 2022 at 9:29 AM
> > Colin
> > > > > > McCabe
> > > > > > > <
> > > > > > > > > > > > > > > >> cmcc...@apache.org>
> > > > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > >> > > > >> >> On Thu, Oct 13, 2022, at 11:44,
> > Jun
> > > > Rao
> > > > > > > wrote:
> > > > > > > > > > > > > > > >> > > > >> >> > Hi, Colin,
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > Thanks for the reply.
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > 10. This is a bit on the
> > > > > implementation
> > > > > > > > side.
> > > > > > > > > If
> > > > > > > > > > > you
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > >> > > > >> existing
> > > > > > > > > > > > > > > >> > > > >> >> > ZK-based controller, most of the
> > > > logic
> > > > > > is
> > > > > > > > > around
> > > > > > > > > > > > > > > >> maintaining an
> > > > > > > > > > > > > > > >> > > > >> in-memory
> > > > > > > > > > > > > > > >> > > > >> >> > state of all the resources
> > > (broker,
> > > > > > topic,
> > > > > > > > > > > > partition,
> > > > > > > > > > > > > > > etc),
> > > > > > > > > > > > > > > >> > > > >> >> reading/writing
> > > > > > > > > > > > > > > >> > > > >> >> > to ZK, sending LeaderAndIsr and
> > > > > > > > UpdateMetadata
> > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > >> > > > handling
> > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > >> > > > >> >> > responses to brokers. So we need
> > > all
> > > > > > that
> > > > > > > > > logic
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > dual
> > > > > > > > > > > > > > > >> > write
> > > > > > > > > > > > > > > >> > > > >> mode.
> > > > > > > > > > > > > > > >> > > > >> >> One
> > > > > > > > > > > > > > > >> > > > >> >> > option is to duplicate all that
> > > > logic
> > > > > in
> > > > > > > > some
> > > > > > > > > > new
> > > > > > > > > > > > > code.
> > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > >> > can
> > > > > > > > > > > > > > > >> > > > be a
> > > > > > > > > > > > > > > >> > > > >> bit
> > > > > > > > > > > > > > > >> > > > >> >> > error prone and makes the code a
> > > bit
> > > > > > > harder
> > > > > > > > to
> > > > > > > > > > > > > maintain
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > >> > > need
> > > > > > > > > > > > > > > >> > > > to
> > > > > > > > > > > > > > > >> > > > >> fix
> > > > > > > > > > > > > > > >> > > > >> >> > some critical issues in ZK-based
> > > > > > > > controllers.
> > > > > > > > > > > > Another
> > > > > > > > > > > > > > > >> option is
> > > > > > > > > > > > > > > >> > > to
> > > > > > > > > > > > > > > >> > > > try
> > > > > > > > > > > > > > > >> > > > >> >> > reusing the existing code in the
> > > > > > ZK-based
> > > > > > > > > > > > controller.
> > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > >> > > example,
> > > > > > > > > > > > > > > >> > > > we
> > > > > > > > > > > > > > > >> > > > >> >> could
> > > > > > > > > > > > > > > >> > > > >> >> > start the EventManager in the
> > > > ZK-based
> > > > > > > > > > controller,
> > > > > > > > > > > > but
> > > > > > > > > > > > > > let
> > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > >> > > > KRaft
> > > > > > > > > > > > > > > >> > > > >> >> > controller ingest new events.
> > This
> > > > has
> > > > > > its
> > > > > > > > own
> > > > > > > > > > > > > > challenges:
> > > > > > > > > > > > > > > >> (1)
> > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > >> > > > >> >> existing
> > > > > > > > > > > > > > > >> > > > >> >> > logic only logs ZK failures and
> > > > > doesn't
> > > > > > > > expose
> > > > > > > > > > > them
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > caller;
> > > > > > > > > > > > > > > >> > > > (2)
> > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > >> > > > >> >> > existing logic may add new
> > events
> > > to
> > > > > the
> > > > > > > > queue
> > > > > > > > > > > > itself
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > >> > > > probably
> > > > > > > > > > > > > > > >> > > > >> >> need
> > > > > > > > > > > > > > > >> > > > >> >> > to think through how this is
> > > > > coordinated
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> > > controller;
> > > > > > > > > > > > > > > >> > > > >> (3)
> > > > > > > > > > > > > > > >> > > > >> >> it
> > > > > > > > > > > > > > > >> > > > >> >> > registers some ZK listeners
> > > > > > unnecessarily
> > > > > > > > (may
> > > > > > > > > > not
> > > > > > > > > > > > be
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > big
> > > > > > > > > > > > > > > >> > > > concern).
> > > > > > > > > > > > > > > >> > > > >> So
> > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > >> > > > >> >> > need to get around those issues
> > > > > > somehow. I
> > > > > > > > am
> > > > > > > > > > > > > wondering
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > >> > > have
> > > > > > > > > > > > > > > >> > > > >> >> > considered both options and
> > which
> > > > > > approach
> > > > > > > > we
> > > > > > > > > > are
> > > > > > > > > > > > > > leaning
> > > > > > > > > > > > > > > >> > towards
> > > > > > > > > > > > > > > >> > > > for
> > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > >> > > > >> >> > implementation.
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >> Yes, this is a good question. My
> > > take
> > > > is
> > > > > > > that
> > > > > > > > a
> > > > > > > > > > big
> > > > > > > > > > > > part
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > >> > > > >> complexity
> > > > > > > > > > > > > > > >> > > > >> >> in the old controller code results
> > > > from
> > > > > > the
> > > > > > > > fact
> > > > > > > > > > > that
> > > > > > > > > > > > we
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > >> ZK
> > > > > > > > > > > > > > > >> > as
> > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > >> > > > >> >> multi-writer database for
> > > propagating
> > > > > > > > > information
> > > > > > > > > > > > > between
> > > > > > > > > > > > > > > >> > different
> > > > > > > > > > > > > > > >> > > > >> >> components. So in the old
> > > controller,
> > > > > > every
> > > > > > > > > write
> > > > > > > > > > to
> > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > needs
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> > be
> > > > > > > > > > > > > > > >> > > > >> >> structured as a compare and swap
> > to
> > > be
> > > > > > fully
> > > > > > > > > > > correct.
> > > > > > > > > > > > > > Every
> > > > > > > > > > > > > > > >> time
> > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > >> > > > get
> > > > > > > > > > > > > > > >> > > > >> >> notified about something, it's
> > > usually
> > > > > in
> > > > > > > the
> > > > > > > > > form
> > > > > > > > > > > of
> > > > > > > > > > > > > > "this
> > > > > > > > > > > > > > > >> znode
> > > > > > > > > > > > > > > >> > > > >> changed"
> > > > > > > > > > > > > > > >> > > > >> >> which prompts a full reload of
> > part
> > > of
> > > > > the
> > > > > > > > data
> > > > > > > > > in
> > > > > > > > > > > ZK
> > > > > > > > > > > > > > (which
> > > > > > > > > > > > > > > >> > itself
> > > > > > > > > > > > > > > >> > > > has
> > > > > > > > > > > > > > > >> > > > >> >> multiple parts, loading,
> > > > deserializing,
> > > > > > > > > > reconciling,
> > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > >> That
> > > > > > > > > > > > > > > >> > all
> > > > > > > > > > > > > > > >> > > > goes
> > > > > > > > > > > > > > > >> > > > >> >> away in the new mode, and we just
> > > have
> > > > > > some
> > > > > > > > code
> > > > > > > > > > > which
> > > > > > > > > > > > > > > >> analyzes
> > > > > > > > > > > > > > > >> > > > >> >> __cluster_metadata and reflects it
> > > in
> > > > 1)
> > > > > > > > updates
> > > > > > > > > > to
> > > > > > > > > > > ZK
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > 2)
> > > > > > > > > > > > > > > >> > > > messages
> > > > > > > > > > > > > > > >> > > > >> sent
> > > > > > > > > > > > > > > >> > > > >> >> out to brokers.
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >> This is pretty decoupled from the
> > > > other
> > > > > > > logic
> > > > > > > > in
> > > > > > > > > > > > > > > >> QuorumController
> > > > > > > > > > > > > > > >> > > and
> > > > > > > > > > > > > > > >> > > > >> >> should be easy to unit test, since
> > > the
> > > > > > same
> > > > > > > > > inputs
> > > > > > > > > > > > from
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> log
> > > > > > > > > > > > > > > >> > > > always
> > > > > > > > > > > > > > > >> > > > >> >> produce the same output in ZK.
> > > > > Basically,
> > > > > > ZK
> > > > > > > > is
> > > > > > > > > > > > > write-only
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > >> > us,
> > > > > > > > > > > > > > > >> > > > we do
> > > > > > > > > > > > > > > >> > > > >> >> not read it (with the big
> > exception
> > > of
> > > > > > > broker
> > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > >> > znodes)
> > > > > > > > > > > > > > > >> > > > and I
> > > > > > > > > > > > > > > >> > > > >> >> think that will greatly simplify
> > > > things.
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >> So I think dual-write mode as
> > > > described
> > > > > > here
> > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > > >> > substantially
> > > > > > > > > > > > > > > >> > > > >> simpler
> > > > > > > > > > > > > > > >> > > > >> >> than trying to run part or all of
> > > the
> > > > > old
> > > > > > > > > > controller
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > >> > parallel. I
> > > > > > > > > > > > > > > >> > > > do
> > > > > > > > > > > > > > > >> > > > >> >> think we will reuse a bunch of the
> > > > > > > > > serialization /
> > > > > > > > > > > > > > > >> > deserialization
> > > > > > > > > > > > > > > >> > > > code
> > > > > > > > > > > > > > > >> > > > >> for
> > > > > > > > > > > > > > > >> > > > >> >> znodes and possibly the code for
> > > > > > > communicating
> > > > > > > > > > with
> > > > > > > > > > > > ZK.
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >> best,
> > > > > > > > > > > > > > > >> > > > >> >> Colin
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > 14. Good point and make sense.
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > Thanks,
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > Jun
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> > On Wed, Oct 12, 2022 at 3:27 PM
> > > > Colin
> > > > > > > > McCabe <
> > > > > > > > > > > > > > > >> > cmcc...@apache.org
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> Hi Jun,
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> Thanks for taking a look. I can
> > > > > answer
> > > > > > > some
> > > > > > > > > > > > questions
> > > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > >> > > > because I
> > > > > > > > > > > > > > > >> > > > >> >> >> collaborated on this a bit, and
> > > > David
> > > > > > is
> > > > > > > on
> > > > > > > > > > > > vacation
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > >> few
> > > > > > > > > > > > > > > >> > > > days.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> On Wed, Oct 12, 2022, at 14:41,
> > > Jun
> > > > > Rao
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > >> > > > >> >> >> > Hi, David,
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks for the KIP. A few
> > > > comments
> > > > > > > below.
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > 10. It's still not very clear
> > > to
> > > > me
> > > > > > how
> > > > > > > > the
> > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > >> > > > works
> > > > > > > > > > > > > > > >> > > > >> in
> > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > >> > > > >> >> >> > dual writes mode to KRaft log
> > > and
> > > > > ZK
> > > > > > > when
> > > > > > > > > the
> > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > >> still
> > > > > > > > > > > > > > > >> > > run
> > > > > > > > > > > > > > > >> > > > in
> > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > >> > > > >> >> >> mode.
> > > > > > > > > > > > > > > >> > > > >> >> >> > Does the KRaft controller
> > run a
> > > > ZK
> > > > > > > based
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > >> > > parallel
> > > > > > > > > > > > > > > >> > > > or
> > > > > > > > > > > > > > > >> > > > >> do
> > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > >> > > > >> >> >> > derive what needs to be
> > written
> > > > to
> > > > > ZK
> > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > >> > > > >> logic?
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> We derive what needs to be
> > > written
> > > > to
> > > > > > ZK
> > > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > >> > > > >> >> logic.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> > I am also not sure how the
> > > KRaft
> > > > > > > > controller
> > > > > > > > > > > > handles
> > > > > > > > > > > > > > > >> broker
> > > > > > > > > > > > > > > >> > > > >> >> >> > registration/deregistration,
> > > > since
> > > > > > > > brokers
> > > > > > > > > > are
> > > > > > > > > > > > > still
> > > > > > > > > > > > > > > >> running
> > > > > > > > > > > > > > > >> > > in
> > > > > > > > > > > > > > > >> > > > ZK
> > > > > > > > > > > > > > > >> > > > >> >> mode
> > > > > > > > > > > > > > > >> > > > >> >> >> and
> > > > > > > > > > > > > > > >> > > > >> >> >> > are not heartbeating to the
> > > KRaft
> > > > > > > > > controller.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> The new controller will listen
> > > for
> > > > > > broker
> > > > > > > > > > > > > registrations
> > > > > > > > > > > > > > > >> under
> > > > > > > > > > > > > > > >> > > > >> /brokers.
> > > > > > > > > > > > > > > >> > > > >> >> >> This is the only znode watch
> > that
> > > > the
> > > > > > new
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > >> do.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> We did consider changing how
> > > > ZK-based
> > > > > > > > broker
> > > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > >> > > worked,
> > > > > > > > > > > > > > > >> > > > >> but it
> > > > > > > > > > > > > > > >> > > > >> >> >> just ended up being too much
> > work
> > > > for
> > > > > > not
> > > > > > > > > > enough
> > > > > > > > > > > > > gain.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > 12. "A new set of nodes will
> > be
> > > > > > > > provisioned
> > > > > > > > > > to
> > > > > > > > > > > > host
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > > controller
> > > > > > > > > > > > > > > >> > > > >> >> >> quorum."
> > > > > > > > > > > > > > > >> > > > >> >> >> > I guess we don't support
> > > starting
> > > > > the
> > > > > > > > KRaft
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > >> > quorum
> > > > > > > > > > > > > > > >> > > on
> > > > > > > > > > > > > > > >> > > > >> >> existing
> > > > > > > > > > > > > > > >> > > > >> >> >> > brokers. It would be useful
> > to
> > > > make
> > > > > > > that
> > > > > > > > > > clear.
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> Agreed
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> > 13. "Once the quorum is
> > > > established
> > > > > > > and a
> > > > > > > > > > > leader
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > >> elected,
> > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > >> > > > >> >> >> controller
> > > > > > > > > > > > > > > >> > > > >> >> >> > will check the state of the
> > > > cluster
> > > > > > > using
> > > > > > > > > the
> > > > > > > > > > > > > > > >> MigrationCheck
> > > > > > > > > > > > > > > >> > > > RPC."
> > > > > > > > > > > > > > > >> > > > >> How
> > > > > > > > > > > > > > > >> > > > >> >> >> does
> > > > > > > > > > > > > > > >> > > > >> >> >> > the quorum controller detect
> > > > other
> > > > > > > > brokers?
> > > > > > > > > > > Does
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > >> > > > >> node
> > > > > > > > > > > > > > > >> > > > >> >> need
> > > > > > > > > > > > > > > >> > > > >> >> >> > to be configured with ZK
> > > > connection
> > > > > > > > string?
> > > > > > > > > > If
> > > > > > > > > > > > so,
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > >> would
> > > > > > > > > > > > > > > >> > be
> > > > > > > > > > > > > > > >> > > > >> useful
> > > > > > > > > > > > > > > >> > > > >> >> to
> > > > > > > > > > > > > > > >> > > > >> >> >> > document the additional
> > configs
> > > > > that
> > > > > > > the
> > > > > > > > > > quorum
> > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > >> > > > needs to
> > > > > > > > > > > > > > > >> > > > >> >> set.
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> Yes, the controllers monitor ZK
> > > for
> > > > > > > broker
> > > > > > > > > > > > > > registrations,
> > > > > > > > > > > > > > > >> as I
> > > > > > > > > > > > > > > >> > > > >> mentioned
> > > > > > > > > > > > > > > >> > > > >> >> >> above. So they need zk.connect
> > > and
> > > > > the
> > > > > > > > other
> > > > > > > > > ZK
> > > > > > > > > > > > > > > connection
> > > > > > > > > > > > > > > >> > > > >> >> configurations.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> > 14. "In order to prevent
> > > further
> > > > > > writes
> > > > > > > > to
> > > > > > > > > > ZK,
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > >> > thing
> > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > >> > > > >> new
> > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft quorum must do is take
> > > over
> > > > > > > > > leadership
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > >> > > > controller.
> > > > > > > > > > > > > > > >> > > > >> "
> > > > > > > > > > > > > > > >> > > > >> >> The
> > > > > > > > > > > > > > > >> > > > >> >> >> ZK
> > > > > > > > > > > > > > > >> > > > >> >> >> > controller processing changes
> > > to
> > > > > > > > > /controller
> > > > > > > > > > > > update
> > > > > > > > > > > > > > > >> > > > asynchronously.
> > > > > > > > > > > > > > > >> > > > >> >> How
> > > > > > > > > > > > > > > >> > > > >> >> >> > does the KRaft controller
> > know
> > > > when
> > > > > > the
> > > > > > > > ZK
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > >> > > > resigned
> > > > > > > > > > > > > > > >> > > > >> >> before
> > > > > > > > > > > > > > > >> > > > >> >> >> > it can safely copy the ZK
> > data?
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> This should be done through
> > > > > > > > > > > > > > > >> expectedControllerEpochZkVersion,
> > > > > > > > > > > > > > > >> > > just
> > > > > > > > > > > > > > > >> > > > >> like
> > > > > > > > > > > > > > > >> > > > >> >> in
> > > > > > > > > > > > > > > >> > > > >> >> >> ZK mode, right? We should bump
> > > this
> > > > > > epoch
> > > > > > > > > value
> > > > > > > > > > > so
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > >> > > writes
> > > > > > > > > > > > > > > >> > > > >> from
> > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > >> > > > >> >> >> old controller will not go
> > > > through. I
> > > > > > > agree
> > > > > > > > > we
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > spell
> > > > > > > > > > > > > > > >> > this
> > > > > > > > > > > > > > > >> > > > out
> > > > > > > > > > > > > > > >> > > > >> in
> > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > >> > > > >> >> >> KIP.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> > 15. We have the following
> > > > > sentences.
> > > > > > > One
> > > > > > > > > says
> > > > > > > > > > > > > > > >> ControllerId
> > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > >> > > > >> random
> > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft broker and the other
> > says
> > > > > it's
> > > > > > > the
> > > > > > > > > > active
> > > > > > > > > > > > > > > >> controller.
> > > > > > > > > > > > > > > >> > > > Which
> > > > > > > > > > > > > > > >> > > > >> one
> > > > > > > > > > > > > > > >> > > > >> >> is
> > > > > > > > > > > > > > > >> > > > >> >> >> > correct?
> > > > > > > > > > > > > > > >> > > > >> >> >> > "UpdateMetadata: for certain
> > > > > metadata
> > > > > > > > > > changes,
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> > > > controller
> > > > > > > > > > > > > > > >> > > > >> >> will
> > > > > > > > > > > > > > > >> > > > >> >> >> > need to send
> > > > UpdateMetadataRequests
> > > > > > to
> > > > > > > > the
> > > > > > > > > ZK
> > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > >> For
> > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > >> > > > >> >> >> > “ControllerId” field in this
> > > > > request,
> > > > > > > the
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > >> should
> > > > > > > > > > > > > > > >> > > > >> specify a
> > > > > > > > > > > > > > > >> > > > >> >> >> > random KRaft broker."
> > > > > > > > > > > > > > > >> > > > >> >> >> > "In the UpdateMetadataRequest
> > > > sent
> > > > > by
> > > > > > > the
> > > > > > > > > > KRaft
> > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > >> > to
> > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > >> > > > >> >> >> > brokers, the ControllerId
> > will
> > > > > point
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > active
> > > > > > > > > > > > > > > >> > controller
> > > > > > > > > > > > > > > >> > > > which
> > > > > > > > > > > > > > > >> > > > >> >> will
> > > > > > > > > > > > > > > >> > > > >> >> >> be
> > > > > > > > > > > > > > > >> > > > >> >> >> > used for the inter-broker
> > > > > requests."
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, this seems like an error
> > to
> > > > me
> > > > > as
> > > > > > > > > well. A
> > > > > > > > > > > > > random
> > > > > > > > > > > > > > > >> value
> > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > >> > > > not
> > > > > > > > > > > > > > > >> > > > >> >> really
> > > > > > > > > > > > > > > >> > > > >> >> >> useful. Plus the text here is
> > > > > > > > > > self-contradictory,
> > > > > > > > > > > > as
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > >> > pointed
> > > > > > > > > > > > > > > >> > > > out.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> I suspect what we should do
> > here
> > > is
> > > > > > add a
> > > > > > > > new
> > > > > > > > > > > > field,
> > > > > > > > > > > > > > > >> > > > >> KRaftControllerId,
> > > > > > > > > > > > > > > >> > > > >> >> >> and populate it with the real
> > > > > > controller
> > > > > > > > ID,
> > > > > > > > > > and
> > > > > > > > > > > > > leave
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> old
> > > > > > > > > > > > > > > >> > > > >> >> controllerId
> > > > > > > > > > > > > > > >> > > > >> >> >> field as -1. A ZK-based broker
> > > that
> > > > > > sees
> > > > > > > > this
> > > > > > > > > > can
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > >> consult
> > > > > > > > > > > > > > > >> > > its
> > > > > > > > > > > > > > > >> > > > >> >> >> controller.quorum.voters
> > > > > configuration
> > > > > > to
> > > > > > > > see
> > > > > > > > > > > where
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > >> should
> > > > > > > > > > > > > > > >> > > send
> > > > > > > > > > > > > > > >> > > > >> >> >> controller-bound RPCs. That
> > > > (static)
> > > > > > > > > > > configuration
> > > > > > > > > > > > > lets
> > > > > > > > > > > > > > > us
> > > > > > > > > > > > > > > >> map
> > > > > > > > > > > > > > > >> > > > >> between
> > > > > > > > > > > > > > > >> > > > >> >> >> controller ID and host:port.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> We should still keep our
> > existing
> > > > > epoch
> > > > > > > > logic
> > > > > > > > > > for
> > > > > > > > > > > > > > > deciding
> > > > > > > > > > > > > > > >> > when
> > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadataRequest /
> > > > > > > > LeaderAndIsrRequests
> > > > > > > > > > are
> > > > > > > > > > > > > stale,
> > > > > > > > > > > > > > > >> with
> > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > >> > > > >> caveat
> > > > > > > > > > > > > > > >> > > > >> >> >> that any kraft-based epoch
> > should
> > > > be
> > > > > > > > treated
> > > > > > > > > as
> > > > > > > > > > > > > greater
> > > > > > > > > > > > > > > >> than
> > > > > > > > > > > > > > > >> > any
> > > > > > > > > > > > > > > >> > > > >> >> ZK-based
> > > > > > > > > > > > > > > >> > > > >> >> >> epoch. After all, the kraft
> > epoch
> > > > is
> > > > > > > coming
> > > > > > > > > > from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > >> of
> > > > > > > > > > > > > > > >> > > > >> >> >> __cluster_metadata, whereas the
> > > ZK
> > > > > > epoch
> > > > > > > > > comes
> > > > > > > > > > > from
> > > > > > > > > > > > > ZK.
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > 16. "Additionally, the
> > > controller
> > > > > > must
> > > > > > > > > > specify
> > > > > > > > > > > > if a
> > > > > > > > > > > > > > > >> broker
> > > > > > > > > > > > > > > >> > in
> > > > > > > > > > > > > > > >> > > > >> >> >> “LiveBrokers”
> > > > > > > > > > > > > > > >> > > > >> >> >> > is KRaft or ZK." Does that
> > > > require
> > > > > > any
> > > > > > > > > > protocol
> > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadata?
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, I am also curious why the
> > > we
> > > > > need
> > > > > > > to
> > > > > > > > > care
> > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > >> > brokers
> > > > > > > > > > > > > > > >> > > > are
> > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > >> > > > >> >> or
> > > > > > > > > > > > > > > >> > > > >> >> >> KRaft in UpdateMetadataRequest.
> > > We
> > > > > > don't
> > > > > > > > > reveal
> > > > > > > > > > > > this
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > >> > clients,
> > > > > > > > > > > > > > > >> > > > so
> > > > > > > > > > > > > > > >> > > > >> can
> > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > >> > > > >> >> >> just leave this out?
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> best,
> > > > > > > > > > > > > > > >> > > > >> >> >> Colin
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks,
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > Jun
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > On Wed, Oct 5, 2022 at 10:07
> > AM
> > > > > > Mickael
> > > > > > > > > > Maison
> > > > > > > > > > > <
> > > > > > > > > > > > > > > >> > > > >> >> mickael.mai...@gmail.com
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> > wrote:
> > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> Hi David,
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks for starting this
> > > > important
> > > > > > > KIP.
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> I've just taken a quick look
> > > so
> > > > > far
> > > > > > > but
> > > > > > > > > I've
> > > > > > > > > > > > got a
> > > > > > > > > > > > > > > >> couple
> > > > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > > > >> > > > >> initial
> > > > > > > > > > > > > > > >> > > > >> >> >> >> questions:
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> 1) What happens if a non
> > KRaft
> > > > > > > > compatible
> > > > > > > > > > > broker
> > > > > > > > > > > > > (or
> > > > > > > > > > > > > > > >> with
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > kafka.metadata.migration.enable
> > > > > set
> > > > > > to
> > > > > > > > > > false)
> > > > > > > > > > > > > joins
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > cluster
> > > > > > > > > > > > > > > >> > > > >> after
> > > > > > > > > > > > > > > >> > > > >> >> >> >> the migration is triggered?
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> 2) In the Failure Modes
> > > section
> > > > > you
> > > > > > > > > mention
> > > > > > > > > > a
> > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > >> > where
> > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > >> > > > >> write
> > > > > > > > > > > > > > > >> > > > >> >> >> >> to ZK fails. What happens
> > when
> > > > the
> > > > > > > > > > divergence
> > > > > > > > > > > > > limit
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > >> > > > reached? Is
> > > > > > > > > > > > > > > >> > > > >> >> >> >> this a fatal condition? How
> > > much
> > > > > > > > > divergence
> > > > > > > > > > > > should
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > >> > allow?
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks,
> > > > > > > > > > > > > > > >> > > > >> >> >> >> Mickael
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >> >> On Wed, Oct 5, 2022 at 12:20
> > > AM
> > > > > > David
> > > > > > > > > > Arthur <
> > > > > > > > > > > > > > > >> > > mum...@gmail.com
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > >> >> wrote:
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > Hey folks, I wanted to get
> > > the
> > > > > > ball
> > > > > > > > > > rolling
> > > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >> > > discussion
> > > > > > > > > > > > > > > >> > > > >> for
> > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > ZooKeeper migration KIP.
> > > This
> > > > > KIP
> > > > > > > > > details
> > > > > > > > > > > how
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > >> > do
> > > > > > > > > > > > > > > >> > > > an
> > > > > > > > > > > > > > > >> > > > >> >> online
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration of metadata from
> > > > > > ZooKeeper
> > > > > > > > to
> > > > > > > > > > > KRaft
> > > > > > > > > > > > as
> > > > > > > > > > > > > > > well
> > > > > > > > > > > > > > > >> as
> > > > > > > > > > > > > > > >> > a
> > > > > > > > > > > > > > > >> > > > >> rolling
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > upgrade of brokers to
> > KRaft
> > > > > mode.
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > The general idea is to
> > keep
> > > > > KRaft
> > > > > > > and
> > > > > > > > > > > > ZooKeeper
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > >> sync
> > > > > > > > > > > > > > > >> > > > during
> > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration, so both types
> > of
> > > > > > brokers
> > > > > > > > can
> > > > > > > > > > > exist
> > > > > > > > > > > > > > > >> > > simultaneously.
> > > > > > > > > > > > > > > >> > > > >> Then,
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > once everything is
> > migrated
> > > > and
> > > > > > > > updated,
> > > > > > > > > > we
> > > > > > > > > > > > can
> > > > > > > > > > > > > > turn
> > > > > > > > > > > > > > > >> off
> > > > > > > > > > > > > > > >> > > > >> ZooKeeper
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > writes.
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > This is a pretty complex
> > > KIP,
> > > > so
> > > > > > > > please
> > > > > > > > > > > take a
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > Thanks!
> > > > > > > > > > > > > > > >> > > > >> >> >> >> > David
> > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > >> > > > > --
> > > > > > > > > > > > > > > >> > > > > -David
> > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> > > --
> > > > > > > > > > > > > > > >> > > -David
> > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > -David
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > -David
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -David
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -David
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -David
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> >
> >
> > --
> > -David
> >



-- 
David Arthur

Reply via email to