On Mon, Jun 26, 2023, at 05:08, Igor Soarez wrote:
> Hi Colin,
>
> Thanks for your support with getting this over the line and that’s
> great re the preliminary pass! Thanks also for sharing your
> thoughts, I've had a careful look at each of these and sharing my
> comments below.
>
> I agree, it is important to avoid a perf hit on non-JBOD.
> I've opted for tagged fields in:
>
>  - Assignment.Directory in PartitionRecord and PartitionChangeRecord
>  - OnlineLogDirs in RegisterBrokerRecord,
>    BrokerRegistrationChangeRecord and BrokerRegistrationRequest
>  - OfflineLogDirs in BrokerHeartbeatRequest
>
> I don't think we should use UUID.Zero to refer to the first log
> directory because this value also indicates "unknown, or no log dir
> yet assigned". We can work with the default value by gating on JBOD
> configuration — determined by log.dirs (Broker side) and
> BrokerRegistration (Controller side).

Hi Igor,

If we want to reserve zero for unknown, we can do that. Then "1" can be 
reserved for the first log directory. But it's unclear to me why we need an 
"unknown" value (see below).

> In non-JBOD:
>  - The single logdir won't have a UUID
>  - BrokerRegistration doesn't list any log dir UUIDs
>  - AssignReplicasToDirs is never used
>
> Directory reassignment will work the same way as in ZK mode, but
> with the difference that the promotion of the future replica
> requires an AssignReplicasToDirs request to update the assignment.
> I've tried to improve the description of this operation and
> included a diagram to illustrate it.
>
> I've renamed LogDirsOfflined to OfflineLogDirs in the
> BrokerHeartbeatRequest. This field was named differently because
> it's only used for log directories that have become offline but are
> not yet represented as offline in the metadata, from the Broker's
> point of view — as opposed to always listing the full set of offline
> log dirs.
>
> I don't think we should identify log directories using system
> paths, because those may be arbitrary. A set of storage devices may
> be mounted and re-mounted on the same set of system paths using a
> different order every time. Kafka only cares about the content, not
> the location of the log directories.

The problem is that the only thing the broker can really reliably determine is 
the UUIDs of the directories that it can read. The UUID of directories that it 
can't read can only be inferred. And the inference can't be done with 100% 
reliability.

>
> I think I have overcomplicated this by trying to identify offline
> log directories. In ZK mode we don't care to do this, and we
> shouldn't do it in KRaft either.

I agree that identifying the UUIDs of the directories that are present (as 
opposed to those that are absent) seems much cleaner. As I mentioned above, we 
can't reliably find missing or broken directories anyway.

> What we need to know is if there
> are any offline log directories, to prevent re-streaming the offline
> replicas into the remaining online log dirs. In ZK mode, the 'isNew'
> flag is used to prevent the Broker from creating partitions when any
> logdir is offline unless they're new. In KRaft the Controller can
> reset the assignment to UUID.Zero for replicas in log dirs not
> listed in the broker registration only when the broker registration
> indicates no offline log dirs.

In the previous email I asked, "who is responsible for assigning replicas to 
broker directories?" Can you clarify what the answer is to that? If the answer 
is the controller, there is no need for an "unknown" state for assignments, 
since the controller can simply choose an assignment immediately when it 
creates a replica.

The broker has access to the same metadata, of course, and already knows what 
directory a replica is supposed to be in. If that directory doesn't exist, then 
the replica is offline. There is no need to modify replicas to have an 
"unknown" assignment state.

best,
Colin


> So I've updated the KIP to:
>  - Remove directory.ids from meta.properties
>  - Change OfflineLogDirs in RegisterBrokerRecord,
>    BrokerRegistrationChangeRecord and BrokerRegistrationRequest
>    to a boolean
>  - Describe this behavior in the Controller in Broker
>
> It’s been a lot of work to get here and I’m similarly excited to get
> this released soon! The vote has been open over the last week
> and I'm happy to give it another two to get any other thoughts without
> rushing. Thanks again for the support and input!
>
> Best,
>
> --
> Igor

Reply via email to