Thank you for the continuous work, I have some small problems related to the implementation details:
1. We have decided to add a new metadata.version, and you have said that "We should also avoid gating on metadata.version to include the `OnlineLogDirectories` field in the broker registration, the existing versioning in the message schemas should suffice to guarantee compatibility", I think this is a big difference from what we had done in the past design, it's better to add this point in "Compatibility, Deprecation, and Migration Plan " section. 2. Related to (1), an important thing about migration is that we should first upgrade controllers before upgrade brokers since broker will only send `BrokerRegisterationRequest` once and we should use the new format and the controller should recoganize the new format, this is difficult in co-resident mode, so you should point out that we only support distributed mode similar to KIP-866. 3. What will happen on receiving "AssignReplicasToDirsRequest"? this should be described in "Controller" section, for example, generating `PartitionChangeRecord` or return error code. Apart from these problems, this KIP LGTM, -- Best, Ziming > On Sep 28, 2022, at 04:59, Igor Soarez <i...@soarez.me> wrote: > > Hi Ziming, > > Thank you for having another look at this KIP, and please accept my apologies > as to my delay in replying. > > The migration introduces JBOD support, so before the migration there should > only be one log directory configured per broker. This assumption simplifies > how the controller learns about the initial log directory placement for > existing partitions. The first broker registration referencing log > directories must declare a single log directory - which must be online as > brokers cannot start without any online log dirs. If there are no previous > log dirs registered for this broker, the controller assigns all existing > partitions in the broker to this single directory. > > Since we assume a single log dir per broker as a migration starting point and > brokers cannot start without any online log directories, the > `OfflineLogDirectories` field simply cannot be used in the first registration > after the upgrade. However, the issue you described applies to > `OnlineLogDirectories`. If use of this field is gated by metadata.version > then we do require a double roll for brokers to make their first registration > with a log directory. > > As you describe, we could move the list of online log directories from the > broker registration request to the heartbea request, but I believe we > shouldn't do this because: > a) The intention is to address a transient problem — the migration into the > feature, which should happen only once — by accepting a long term commitment > as to how the online log dirs are signaled. > b) While the log directory status can change at any point from online to > offline, the list of existing log directories is static for broker's > lifetime, so it makes little sense to send this information over and over > again to the controller. > > We should also avoid gating on metadata.version to include the > `OnlineLogDirectories` field in the broker registration. If the controllers > are upgraded first, the existing versioning in the message schemas should > suffice to guarantee compatibility. After the controllers are upgraded, the > brokers can then be upgraded and each of them can register their single > online log directory from the first instantiation. > > Please, let me know if this makes sense and any other thoughts you might have. > > Best, > > -- > Igor > > On Thu, Sep 1, 2022, at 2:04 AM, deng ziming wrote: >> Hi Igor, >> >> I think this KIP can solve the current problems, I have some problems >> relating to the migration section. >> >> Since we have bumped broker RPC version and metadata record version, >> there will be some problems between brokers/controllers of different >> versions. In ZK mode we use IBP as a flag to help solve this, in KRaft >> mode we use a feature flag(metadata.version) as a flag for using new >> RPC/metadata or not. >> >> Assuming that we are upgrading from 3.3 to 3.4, firstly the finalized >> version of metadata.version is 3.3, brokers will use version 1 of >> `BrokerRegistrationRequest` which contains no `OfflineLogDirectories`, >> finally the finalized version of metadata.version is 3.4, but brokers >> will no longer send `BrokerRegistrationRequest` unless we restart the >> broker, so controllers can’t be aware of the `OfflineLogDirectories` of >> each broker, so we should reconsider the suggestion of Jason to use >> `BrokerHeartbeatRequest` to communicate `OfflineLogDirectories`. >> >> Of course this problem can be solved through a double roll(restart >> broker twice when upgrading), but we should trying to avoid it since we >> now have feature flag. >> >> One solution is that we include `OfflineLogDirectories` both in >> `BrokerRegistrationRequest` and `BrokerHeartbeatRequest` requests, when >> upgrading from 3.3 to 3.4 the >> `BrokerRegistrationRequest.OfflineLogDirectories` is empty whereas when >> upgrading from 3.4 to 3.5 it will not be empty. And maybe we can also >> remove `LogDirectoriesOfflineRequest` you proposed in this KIP. >> >> -- >> Best, >> Ziming >> >> >>> On Aug 18, 2022, at 11:24 PM, Igor Soarez <i...@soarez.me> wrote: >>> >>> Hi Ziming, >>> >>> I'm sorry it took me a while to reply. >>> >>> Thank you for having a look at this KIP and providing feedback. >>> >>>> 1. We have a version field in meta.properties, currently it’s 1, and we can >>>> set it to 2 in this KIP, and we can give an example of server.properties >>>> and >>>> it’s corresponding meta.properties generated by the storage command tool. >>> >>>> 2. When using storage format tool we should specify cluster-id, it will be >>>> an >>>> arduous work if we should manually specify directory-ids for all log dirs, >>>> I think you can make it more clear about this change that the directory-ids >>>> are generated automatically. >>> >>> Thank you for these suggestions. I've updated the KIP to: >>> >>> * include an example >>> * clarify that the version field would be changed to 2 >>> * clarify that the log directory UUIDs are automatically generated >>> >>>> 3. When controller place a replica, it will select a broker as well as a >>>> log >>>> directory, the latter is currently accomplished in the broker side, so this >>>> will be a big change? >>> >>> I think there can be benefits, as Jason described previously, if we change >>> how >>> log directories are assigned as follow-up work. >>> >>> From a codebase surface area perspective, it is definitely a big change >>> because there are many models, types and interfaces that assume replicas are >>> identified solely by a broker id. >>> That will have to change to include the directory UUID, many lines of code >>> will >>> be affected. >>> >>> But in terms of behavior it shouldn't be a big change at all. Brokers >>> currently >>> select the log directory with the least logs in it. This isn't a very nice >>> policy, as logs can have wildly different sizes, and log directories can >>> have >>> different capacities. But it is a policy that the controller can keep. >>> >>> If we decide to extend the selection policy and keep it in the broker, >>> the broker will continue to be able to override the controller's selection >>> of log directory, using the `AssignReplicasToDirectories` RPC. >>> >>>> 4. When handling log directory failures we will update Leader and ISR using >>>> the existing replica state machine, what is this state machine referring >>>> to, >>>> do you mean the AlterPartition RPC? >>> >>> No, I mean that it will use the same rules and mechanism >>> (`PartitionChangeBuilder`) that is used when a broker is fenced, shutdown or >>> unregistered. >>> >>> I think maybe the term "replica state machine" isn't the very appropriate. >>> I've updated the KIP to rephrase this section. >>> >>> Thanks, >>> >>> -- >>> Igor