Hi all, I was doing some JIRA cleanup and noticed someone commenting about the language in KIP-866 referencing an offline KRaft -> ZK conversion tool. Since we never ended up implementing this, I have removed it from the KIP.
best, Colin On Wed, Nov 30, 2022, at 12:27, David Arthur wrote: > Hey everyone, I'm going to close out the vote. There were three +1 > votes from PMC members and no committer or community votes. > > +1 PMC votes: > * Colin McCabe > * Jason Gustafson > * Jun Rao > > With that, the vote passes. Thanks to everyone who reviewed this KIP! > > Cheers, > David > > On Tue, Nov 29, 2022 at 7:48 PM Jun Rao <j...@confluent.io.invalid> wrote: >> >> Hi, David, >> >> Thanks for the KIP. +1 >> >> Jun >> >> On Tue, Nov 22, 2022 at 3:43 PM Jason Gustafson <ja...@confluent.io.invalid> >> wrote: >> >> > Thanks, +1 from me. I suspect we might be in for at least one surprise with >> > the re-implemented controller RPCs, but I agree the alternative has risks >> > as well. >> > >> > Best, >> > Jason >> > >> > On Mon, Nov 14, 2022 at 12:00 PM Colin McCabe <cmcc...@apache.org> wrote: >> > >> > > On Fri, Nov 11, 2022, at 08:59, David Arthur wrote: >> > > > Thanks, Colin. >> > > > >> > > >> never start an upgrade without first verifying the quorum >> > configuration >> > > on the ZK-based brokers >> > > > >> > > > I agree that this is a pretty big benefit. I could imagine debugging >> > > > and fixing connection problems mid-migration would be a big pain. >> > > > Especially if you had some brokers correctly configured, and others >> > > > not. >> > > > >> > > > Adding a heartbeat raises some questions about what to do if a broker >> > > > goes into a bad state, or stops heartbeating, during a migration. >> > > > However, I think the same is true for a registration based approach, >> > > > so maybe it's not an increase in net complexity. >> > > > >> > > >> > > Hi David, >> > > >> > > Yeah. I think the goal should be for the set of heartbeaters to match the >> > > set of broker registrations under /brokers >> > > >> > > Obviously, people could add or remove brokers after the upgrade has >> > begun, >> > > but that's unavoidable, I think. We can at least ensure that at the time >> > we >> > > enter upgrade, all the brokers are ready. >> > > >> > > > I've replaced the ZK registration section with a new RPC and brief >> > > > description. Please take a look. >> > > > >> > > >> > > Thanks, David. With these changes it LGTM to me. >> > > >> > > +1 (binding) >> > > >> > > Colin >> > > >> > > > Thanks! >> > > > David >> > > > >> > > > On Wed, Nov 9, 2022 at 5:46 PM Colin McCabe <cmcc...@apache.org> >> > wrote: >> > > >> >> > > >> Hi David, >> > > >> >> > > >> Thanks for the response. Replies inline. >> > > >> >> > > >> On Wed, Nov 9, 2022, at 08:17, David Arthur wrote: >> > > >> > Colin >> > > >> > >> > > >> >> Maybe zk.metadata.migration.enable ? >> > > >> > >> > > >> > Done. I went with "zookeeper.metadata.migration.enable" since our >> > > >> > other ZK configs start with "zookeeper.*" >> > > >> > >> > > >> >> SImilarly, for MigrationRecord: can we rename this to >> > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. >> > > >> > >> > > >> > Sure >> > > >> > >> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that >> > > we will eventually compact all the metadata logs into a snapshot. That >> > > snapshot will then have to keep alive the memory of the old migration. So >> > > it is not really a matter of replaying the old metadata logs (probably) >> > but >> > > a matter of checking to see what the ZkMigrationState is, which I suppose >> > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we >> > already >> > > migrated / are migrating. >> > > >> > >> > > >> > Yea, makes sense. >> > > >> > >> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I >> > > thought ZK already tracked this information in the mzxid of the znode? >> > > >> > >> > > >> > Yes, Jun pointed this out previously, I missed this update in the >> > KIP. >> > > >> > Fixed now. >> > > >> > >> > > >> >> It is true that technically it is only needed in UMR, but I would >> > > still suggest including KRaftControllerId in LeaderAndIsrRequest because >> > it >> > > will make debugging much easier. >> > > >> >> >> > > >> >> I would suggest not implementing the topic deletion state machine, >> > > but just deleting topics eagerly when in migration mode. We can implement >> > > this behavior change by keying off of whether KRaftControllerId is >> > present >> > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full >> > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as >> > > expected (again, this behavior change would only be for migration mode) >> > > >> > >> > > >> > Sounds good to me. Since this is somewhat of an implementation >> > detail, >> > > >> > do you think we need this included in the KIP? >> > > >> >> > > >> Yeah, maybe we don't need to go into the delete behavior here. But I >> > > think the KIP should specify that we have KRaftControllerId in both >> > > LeaderAndIsrRequest. That will allow us to implement this behavior >> > > conditionally on zk-based brokers when in dual write mode. >> > > >> >> > > >> > >> > > >> >> For existing KRaft controllers, will >> > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 >> > > (MigrationFinalized)? I assume this is true, but it would be good to >> > spell >> > > it out. Sorry if this is answered somewhere else. >> > > >> > >> > > >> > We discussed using 0 (None) as the value to report for original, >> > > >> > un-migrated KRaft clusters. 4 (MigrationFinalized) would be for >> > > >> > clusters which underwent a migration. I have some description of >> > this >> > > >> > in the table under "Migration Overview" >> > > >> > >> > > >> >> > > >> I don't feel that strongly about this, but wouldn't it be a good idea >> > > for MigrationState to have a different value for ZK-based clusters and >> > > KRaft-based clusters? If you have a bunch of clusters and you take an >> > > aggregate of this metric, it would be good to get a report of three >> > numbers: >> > > >> 1. unupgraded ZK >> > > >> 2. in progress upgrades >> > > >> 3. kraft >> > > >> >> > > >> I guess we could get that from examining some other metrics too, >> > > though. Not sure, what do you think? >> > > >> >> > > >> >> As you point out, the ZK brokers being upgraded will need to >> > contact >> > > the KRaft quorum in order to forward requests to there, once we are in >> > > migration mode. This raises a question: rather than changing the broker >> > > registration, can we have those brokers send an RPC to the kraft >> > controller >> > > quorum instead? This would serve to confirm that they can reach the >> > quorum. >> > > Then the quorum could wait for all of the brokers to check in this way >> > > before starting the migration (It would know all the brokers by looking >> > at >> > > /brokers) >> > > >> > >> > > >> > One of the motivations I had for putting the migration details in >> > the >> > > >> > broker registration is that it removes ordering constraints between >> > > >> > the brokers and controllers when starting the migration. If we set >> > the >> > > >> > brokers in migration mode before the KRaft quorum is available, they >> > > >> > will just carry on until the KRaft controller takes over the >> > > >> > controller leadership and starts sending UMR/LISR. >> > > >> > >> > > >> >> > > >> I agree that any scheme that requires complicated startup ordering is >> > > not a good idea. I was more thinking about having the zk-based brokers >> > > periodically send some kind of "I'm here" upgrade heartbeat to the >> > > controller quorum. >> > > >> >> > > >> > I wonder if we could infer reachability of the new KRaft controller >> > by >> > > >> > the brokers through ApiVersions? Once taking leadership, the active >> > > >> > KRaft controller could monitor its NodeApiVersions to see that each >> > of >> > > >> > the ZK brokers had reached it. Would this be enough to verify >> > > >> > reachability? >> > > >> > >> > > >> >> > > >> If the new KRaft controller has already taken leadership, it's too >> > late >> > > to do anything about brokers that can't reach it. Unless we want to >> > > implement automatic rollback to the pre-upgrade state, which sounds quite >> > > complex and error prone. >> > > >> >> > > >> So, to sum up: my proposal was that the ZK based brokers periodically >> > > (like maybe every 10 seconds) try to contact the quorum with an RPC >> > > containing their IBP and readiness state. I guess the main negative thing >> > > is that it would be more network traffic (but probably not significantly >> > > more.) The main positive thing is that we'd never start an upgrade >> > without >> > > first verifying the quorum configuration on the ZK-based brokers. A >> > neutral >> > > thing (neither good nor bad, I think) is that we'd have a new RPC rather >> > > than a znode change. >> > > >> >> > > >> best, >> > > >> Colin >> > > >> >> > > >> >> > > >> > -David >> > > >> > >> > > >> > >> > > >> > On Tue, Nov 8, 2022 at 6:15 PM Colin McCabe <cmcc...@apache.org> >> > > wrote: >> > > >> >> >> > > >> >> Hi David, >> > > >> >> >> > > >> >> Looks great. Some questions: >> > > >> >> >> > > >> >> I agree with Jun that it would be good to rename >> > > metadata.migration.enable to something more zk-specific. Maybe >> > > zk.metadata.migration.enable ? >> > > >> >> >> > > >> >> SImilarly, for MigrationRecord: can we rename this to >> > > ZkMigrationStateRecord? Then change MigrationState -> ZkMigrationState. >> > > >> >> >> > > >> >> With ZkMigrationStateRecord, one thing to keep in mind here is that >> > > we will eventually compact all the metadata logs into a snapshot. That >> > > snapshot will then have to keep alive the memory of the old migration. So >> > > it is not really a matter of replaying the old metadata logs (probably) >> > but >> > > a matter of checking to see what the ZkMigrationState is, which I suppose >> > > could be Optional<ZkMigrationState>. If it's not Optional.empty, we >> > already >> > > migrated / are migrating. >> > > >> >> >> > > >> >> For the /migration ZNode, is "last_update_time_ms" necessary? I >> > > thought ZK already tracked this information in the mzxid of the znode? >> > > >> >> >> > > >> >> It is true that technically it is only needed in UMR, but I would >> > > still suggest including KRaftControllerId in LeaderAndIsrRequest because >> > it >> > > will make debugging much easier. >> > > >> >> >> > > >> >> I would suggest not implementing the topic deletion state machine, >> > > but just deleting topics eagerly when in migration mode. We can implement >> > > this behavior change by keying off of whether KRaftControllerId is >> > present >> > > in LeaderAndIsrRequest. On broker startup, we'll be sent a full >> > > LeaderAndIsrRequest and can delete stray partitions whose IDs are not as >> > > expected (again, this behavior change would only be for migration mode) >> > > >> >> >> > > >> >> For existing KRaft controllers, will >> > > kafka.controller:type=KafkaController,name=MigrationState show up as 4 >> > > (MigrationFinalized)? I assume this is true, but it would be good to >> > spell >> > > it out. Sorry if this is answered somewhere else. >> > > >> >> >> > > >> >> As you point out, the ZK brokers being upgraded will need to >> > contact >> > > the KRaft quorum in order to forward requests to there, once we are in >> > > migration mode. This raises a question: rather than changing the broker >> > > registration, can we have those brokers send an RPC to the kraft >> > controller >> > > quorum instead? This would serve to confirm that they can reach the >> > quorum. >> > > Then the quorum could wait for all of the brokers to check in this way >> > > before starting the migration (It would know all the brokers by looking >> > at >> > > /brokers) >> > > >> >> >> > > >> >> best, >> > > >> >> Colin >> > > >> >> >> > > >> >> On Tue, Nov 8, 2022, at 07:23, David Arthur wrote: >> > > >> >> > Hello everyone, I'd like to start the vote on KIP-866. >> > > >> >> > >> > > >> >> > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration >> > > >> >> > >> > > >> >> > Thanks! >> > > >> >> > David Arthur >> > > >> > >> > > >> > >> > > >> > >> > > >> > -- >> > > >> > David Arthur >> > > > >> > > > >> > > > >> > > > -- >> > > > David Arthur >> > > >> > > > > > -- > David Arthur