Bump
On Mon, Apr 7, 2025 at 4:29 PM Mickael Maison <mickael.mai...@gmail.com> wrote:
>
> Hi David,
>
> DA1: Done
>
> DA2.1: Yes because we only have the directory ids in the metadata log.
> At the moment only brokers have the path -> id mappings.
>
> DA2.2: I proposed this mechanism because of DA2.1. Also this is
> consistent with other operations like updating log levels. If people
> would prefer to have dedicated APIs, I can adjust the proposal.
>
> DA4: Good idea, I added metrics to the KIP.
>
> Thanks,
> Mickael
>
> On Mon, Apr 7, 2025 at 4:23 PM Mickael Maison <mickael.mai...@gmail.com>
> wrote:
> >
> > Hi Jun,
> >
> > 10: Yes that's good idea. I've added that to the KIP.
> >
> > 11: Yes that was my plan. We need to check the metadata version and
> > reject updates to the configuration is the metadata version is too
> > low. I clarified that in the KIP.
> > Did you have anything else in mind (if you can remember what you
> > thought over 8 months ago!)? I don't think we need to expose a
> > feature.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jul 30, 2024 at 4:22 PM David Arthur <mum...@gmail.com> wrote:
> > >
> > > DA1: I like Jun's suggestion of using a wildcard, this would also help
> > > with
> > > the case I mentioned (cordon a whole broker, regardless of how many log
> > > dirs).
> > >
> > > DA2.1: Re: log dir names to UUID mapping -- do you mean a new
> > > CordonLogDirs
> > > RPC would need to send the UUID instead of the log dir name?
> > >
> > > DA2.2: In general, I don't really like the idea of using dynamic configs
> > > as
> > > an API for modifying state. I know we do it a lot, but I think it's an
> > > anti-pattern. It makes it difficult to evolve an API since you're stuck
> > > with the key/value config system. Also, the dynamic config code is a lot
> > > harder to navigate and understand than KafkaApis and MetadataImage (my
> > > opinion is probably biased here).
> > >
> > > DA3: I was thinking along the lines of this being a new feature, so we may
> > > want to just disable the code paths entirely with a config. For example,
> > > maybe we introduce a rare bug that breaks assignments, or broker
> > > registration, or whatever -- it's nicer to have a config to turn off a
> > > feature rather than having to downgrade. I guess this really depends on
> > > how
> > > tightly integrated the new feature is with existing code. Let's leave this
> > > out of the KIP for now and maybe revisit after the feature is done.
> > >
> > > DA4: Ok, thanks. So eventually the operator would un-cordon the brokers to
> > > allow normal placements to occur. Since we expect the operator to
> > > eventually come back and un-cordon the brokers, this means it will
> > > occasionally be forgotten (since it's a human task). WDYT about a new
> > > broker metric to show the number of cordoned log dirs? This would help
> > > operators set up monitors like "alert if cordoned count > 0 for 7 days" or
> > > similar.
> > >
> > > -David
> > >
> > > On Wed, Jul 24, 2024 at 1:21 PM Jun Rao <j...@confluent.io.invalid> wrote:
> > >
> > > > Hi, Mickael,
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > 10. A common case is only one log dir per broker. Could we support sth
> > > > like --add-config cordoned.log.dirs=* to make it more convenient for
> > > > this
> > > > case?
> > > >
> > > > 11. Since we changed the metadata record format, should we gate the new
> > > > configuration based on a new metadata version?
> > > >
> > > > Jun
> > > >
> > > > On Wed, Jul 17, 2024 at 9:18 AM Mickael Maison
> > > > <mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Kamal,
> > > > >
> > > > > Good spot, yes this is a typo. The flexibleVersions stays as "0+".
> > > > > Fixed
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > > > >
> > > > > On Wed, Jul 17, 2024 at 6:14 PM Mickael Maison
> > > > > <mickael.mai...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > DA1: It's a point I considered, especially being able to cordon a
> > > > > > whole broker. With the current proposal you just need to set
> > > > > > cordoned.log.dirs to the same value as log.dirs. That does not seem
> > > > > > excessively difficult.
> > > > > >
> > > > > > DA2: I did consider using a new metadata record like we do for
> > > > > > fence/unfence. Since I don't expect cordoned log dirs to change very
> > > > > > frequently, and the size should be small, I opted to reuse the
> > > > > > BrokerRegistrationRecord metadata record. At the moment I guess it
> > > > > > was
> > > > > > mostly for the convenience while prototyping. Semantically it
> > > > > > probably
> > > > > > makes sense to have separate records. Your further point suggest
> > > > > > design changes in the mechanism as a whole, so let's discuss these
> > > > > > first and we can return to the exact metadata records after.
> > > > > > I find the idea of having dedicated RPCs interesting. One of the
> > > > > > reasons I piggybacked on the heartbeating process is for the mapping
> > > > > > between log directory names and their UUIDs. Currently the mapping
> > > > > > only exists on each broker instance. So if we wanted a dedicated
> > > > > > RPC,
> > > > > > we would first need to change how we manage the log directory to
> > > > > > UUID
> > > > > > mappings. I guess this could be done via the BrokerRegistration API.
> > > > > > I'm not sure about storing additional metadata (reason, timestamp).
> > > > > > We
> > > > > > currently don't do that for any operations
> > > > > > (AlterPartitionReassignments, UnregisterBroker). Typically these are
> > > > > > stored in the tools used by the operators to drive these operations.
> > > > > > You bring another interesting point about the ability to cordon
> > > > > > brokers/log directories while they are offline. It's not something
> > > > > > the
> > > > > > current proposal supports. I'm not sure how useful this would turn
> > > > > > out
> > > > > > it practice. In experience is that brokers are mostly online, so I'd
> > > > > > expect the need to do so relatively rare. This also kind of loops
> > > > > > back
> > > > > > to KAFKA-17094 [0] and the discussion Gantigmaa started on the dev
> > > > > > list [1] about being able to identify the ids of offline (but still
> > > > > > registered) brokers.
> > > > > >
> > > > > > DA3: With the current proposal, I don't see a reason why you would
> > > > > > want to disable the new behavior. If you don't want to use it, you
> > > > > > have nothing to do. It's opt-in as you need to set cordoned.log.dirs
> > > > > > on some brokers to get the new behavior. If you don't want it
> > > > > > anymore,
> > > > > > you should unset cordoned.log.dirs. Can you explain why this would
> > > > > > not
> > > > > > work?
> > > > > >
> > > > > > DA4: Yes
> > > > > >
> > > > > > 0: https://issues.apache.org/jira/browse/KAFKA-17094
> > > > > > 1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo
> > > > > >
> > > > > > Thanks,
> > > > > > Mickael
> > > > > >
> > > > > >
> > > > > > On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash
> > > > > > <kamal.chandraprak...@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Mickael,
> > > > > > >
> > > > > > > In the BrokerHearbeatRequest.json, the flexibleVersions are bumped
> > > > from
> > > > > > > "0+" to "1+". Is it a typo?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur <mum...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Mickael, thanks for the KIP! I think this could be quite a
> > > > > > > > useful
> > > > > feature.
> > > > > > > >
> > > > > > > > DA1: Having to know each of the log dirs for a broker seems a
> > > > > > > > bit
> > > > > > > > inconvenient for cases where we want to cordon off a whole
> > > > > > > > broker.
> > > > I
> > > > > do
> > > > > > > > think having the ability to cordon off a specific log dir is
> > > > > > > > useful
> > > > > for
> > > > > > > > JBOD, but I imagine a common case might be to cordon off the
> > > > > > > > whole
> > > > > broker.
> > > > > > > >
> > > > > > > > DA2: Looks like the new "cordoned.log.dirs" can be configured
> > > > > statically
> > > > > > > > and updated dynamically per-broker. What do you think about a
> > > > > > > > new
> > > > > metadata
> > > > > > > > record and RPC instead of using a config? From my understanding,
> > > > the
> > > > > > > > BrokerRegistration/Heartbeat is more about the lifecycle of a
> > > > broker
> > > > > > > > whereas cordoning a broker is an operator driven action. It
> > > > > > > > might
> > > > > make
> > > > > > > > sense to have a separate record for this. We could include
> > > > additional
> > > > > > > > fields like a timestamp, a reason/comment field (e.g.,
> > > > > "decommissioning",
> > > > > > > > "disk failure", "new broker" etc), stuff like that.
> > > > > > > >
> > > > > > > > This would also allow cordoning to be done while a broker is
> > > > offline
> > > > > or
> > > > > > > > before it has been provisioned. Not sure how likely that is, but
> > > > > might be
> > > > > > > > useful?
> > > > > > > >
> > > > > > > > DA3: Can we consider having a configuration to enable/disable
> > > > > > > > the
> > > > new
> > > > > > > > replica placer behavior? This would be separate from the new
> > > > > > > > MetadataVersion for the RPC/record changes.
> > > > > > > >
> > > > > > > > DA4: In the Motivation section, you mention the cluster
> > > > > > > > expansion
> > > > > scenario.
> > > > > > > > For this scenario, is the expectation that the operator will
> > > > > > > > cordon
> > > > > off the
> > > > > > > > existing full brokers so placements only happen on the new
> > > > > > > > brokers?
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > David
> > > > > > > >
> > > > > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison <
> > > > > mickael.mai...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Kamal,
> > > > > > > > >
> > > > > > > > > Thanks for taking a look at the KIP!
> > > > > > > > >
> > > > > > > > > I briefly considered that option initially but I found it not
> > > > very
> > > > > > > > > practical once you have more than a few cordoned log
> > > > > > > > > directories.
> > > > > > > > > I find your example is already not very easy to read, and it
> > > > > > > > > only
> > > > > has
> > > > > > > > > 2 entries. Also if the configuration is at the cluster level
> > > > it'sis
> > > > > > > > > not easy to see if a broker has all its log directories
> > > > > > > > > cordoned,
> > > > > and
> > > > > > > > > you still need to describe a specific broker's configuration
> > > > > > > > > to
> > > > > find
> > > > > > > > > the "name" of a log directory you want to cordon.
> > > > > > > > >
> > > > > > > > > I think an easy way to get an overall view of the cordoned log
> > > > > > > > > directories/brokers will be via the kafka-log-dirs.sh tool. I
> > > > > > > > > am
> > > > > also
> > > > > > > > > considering adding metrics like we have today for
> > > > > LogDirectoryOffline
> > > > > > > > > to ease monitoring.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Mickael
> > > > > > > > >
> > > > > > > > > On Thu, Jul 11, 2024 at 8:41 PM Kamal Chandraprakash
> > > > > > > > > <kamal.chandraprak...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Mickael,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP!
> > > > > > > > > >
> > > > > > > > > > This is a useful feature which helps to decommission the
> > > > > > > > > > nodes
> > > > by
> > > > > > > > > > essentially
> > > > > > > > > > creating a new replica exclude broker list.
> > > > > > > > > >
> > > > > > > > > > To cordon a list of brokers, we have to apply the config on
> > > > each
> > > > > of the
> > > > > > > > > > broker nodes
> > > > > > > > > > and similarly to see the list of cordoned brokers, we have
> > > > > > > > > > to
> > > > > either
> > > > > > > > > query
> > > > > > > > > > individual broker
> > > > > > > > > > config in the cluster or query each of the broker log
> > > > directory.
> > > > > > > > > >
> > > > > > > > > > Can we move the configuration from the broker to cluster
> > > > > > > > > > level?
> > > > > (eg)
> > > > > > > > > >
> > > > > > > > > > bin/kafka-configs.sh --bootstrap-server localhost:9092
> > > > > > > > --broker-defaults
> > > > > > > > > > --alter --add-config
> > > > > > > > > > cordoned.log.dirs=[broker.id
> > > > > .0]:[log.dir.0],[broker.id.1]:[log.dir.1]
> > > > > > > > > >
> > > > > > > > > > This will provide a consistent view of the list of cordoned
> > > > > brokers in
> > > > > > > > > the
> > > > > > > > > > cluster.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Kamal
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 10, 2024 at 7:53 PM Mickael Maison <
> > > > > > > > mickael.mai...@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Luke,
> > > > > > > > > > >
> > > > > > > > > > > 4. You're right this scenario can happen. In this case I
> > > > think
> > > > > the
> > > > > > > > > > > broker should enforce its new state and not create the
> > > > replica
> > > > > as all
> > > > > > > > > > > its log directories are now cordoned. The replica will be
> > > > > offline and
> > > > > > > > > > > an administrator would need to reassign it to another
> > > > broker. I
> > > > > > > > expect
> > > > > > > > > > > most users will rely on kafka-configs.sh to manage the
> > > > > cordoned log
> > > > > > > > > > > directories instead of updating the broker properties, so
> > > > > > > > > > > it
> > > > > should
> > > > > > > > > > > not be a common issue.
> > > > > > > > > > >
> > > > > > > > > > > 6. To resolve this issue, the user can uncordon a log
> > > > > directory and
> > > > > > > > > > > retry an operation that triggers the creation of the
> > > > > > > > > > > internal
> > > > > topics.
> > > > > > > > > > > For example start a consumer using a group should make the
> > > > > broker
> > > > > > > > > > > retry creating the __consumer_offsets topic.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Mickael
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 10, 2024 at 4:14 PM Mickael Maison <
> > > > > > > > > mickael.mai...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Chia-Ping,
> > > > > > > > > > > >
> > > > > > > > > > > > Question 1) Yes that's a good summary. I'd also add that
> > > > > managing
> > > > > > > > > > > > cordoned log directories is intended to be done by
> > > > > > > > > > > > cluster
> > > > > > > > > > > > administrators who also know about operations
> > > > > > > > > > > > in-progress
> > > > or
> > > > > > > > planned
> > > > > > > > > > > > such as scaling or adding/removing log directories. In
> > > > > practice you
> > > > > > > > > > > > can't expect users to provide "good" explicit partition
> > > > > assignments
> > > > > > > > > as
> > > > > > > > > > > > they are not aware of the cluster operations.
> > > > > > > > > > > >
> > > > > > > > > > > > Question 2) I don't see an actual question :)
> > > > > > > > > > > >
> > > > > > > > > > > > Question 3) My current proposal is that in that case,
> > > > > brokers will
> > > > > > > > > > > > ignore the preferred log directory and place the new
> > > > replica
> > > > > on a
> > > > > > > > log
> > > > > > > > > > > > directory that is not cordoned. To use
> > > > > > > > > > > > AlterReplicaLogDirs
> > > > > you need
> > > > > > > > > > > > ALTER permission on the CLUSTER resource. In most
> > > > > environments
> > > > > > > > users
> > > > > > > > > > > > don't have permissions to use that API. Or if they do,
> > > > > > > > > > > > I'd
> > > > > expect
> > > > > > > > > them
> > > > > > > > > > > > to also have ALTER_CONFIGS on the BROKER resource and be
> > > > > able to
> > > > > > > > > > > > change the cordoned log directories configuration.
> > > > > > > > > > > >
> > > > > > > > > > > > Question 4) If we don't include the CordonedLogDirs
> > > > > > > > > > > > field
> > > > in
> > > > > > > > > > > > BrokerRegistrationRequest as you said there's a small
> > > > window
> > > > > when
> > > > > > > > the
> > > > > > > > > > > > controller would not know if a broker can be used for
> > > > > assignments.
> > > > > > > > If
> > > > > > > > > > > > all log directories are cordoned the controller should
> > > > > > > > > > > > not
> > > > > use that
> > > > > > > > > > > > broker for assignments.
> > > > > > > > > > > >
> > > > > > > > > > > > Question 5) In KRaft mode offline brokers are still
> > > > > registered, and
> > > > > > > > > > > > can be used for partition assignment. So we need to
> > > > > > > > > > > > persist
> > > > > the
> > > > > > > > > > > > cordoned log directories too to exclude brokers that
> > > > > > > > > > > > don't
> > > > > have
> > > > > > > > > > > > uncordoned log directories.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Mickael
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:25 PM Luke Chen <
> > > > > show...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Mickael,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the response.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the
> > > > > metadata log
> > > > > > > > > via the
> > > > > > > > > > > > > RegisterBrokerRecord, BrokerRegistrationChangeRecord
> > > > > records. If
> > > > > > > > a
> > > > > > > > > > > > > broker is offline, the controller will use the latest
> > > > > known state
> > > > > > > > > of
> > > > > > > > > > > > > the broker to determine the broker's cordoned log
> > > > > directories.
> > > > > > > > I've
> > > > > > > > > > > > > added a sentence clarifying this point.
> > > > > > > > > > > > >
> > > > > > > > > > > > > OK, so if the broker A goes offline, and the
> > > > > > > > > > > > > controller
> > > > is
> > > > > in
> > > > > > > > > "fenced"
> > > > > > > > > > > > > state, without any cordoned log dirs, then some topic
> > > > > created and
> > > > > > > > > > > assigned
> > > > > > > > > > > > > to broker A. Later, broker A starts up with all its
> > > > > > > > > > > > > log
> > > > > dirs
> > > > > > > > > cordoned
> > > > > > > > > > > > > configured. At this situation, will the broker A
> > > > > > > > > > > > > create
> > > > the
> > > > > > > > > partitions?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario a
> > > > > > > > configuration
> > > > > > > > > > > > > error. If all log directories are cordoned before the
> > > > > internal
> > > > > > > > > topics
> > > > > > > > > > > > > are created, then the broker will not be able to
> > > > > > > > > > > > > create
> > > > > them.
> > > > > > > > This
> > > > > > > > > > > > > seems like a pretty strange scenario, where it's the
> > > > first
> > > > > time
> > > > > > > > you
> > > > > > > > > > > > > start a broker, you've cordoned all its log directory,
> > > > and
> > > > > the
> > > > > > > > > > > > > internal topics (offsets and transactions) have not
> > > > > > > > > > > > > yet
> > > > > been
> > > > > > > > > created
> > > > > > > > > > > > > in the rest of the cluster.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, I agree that this should be a configuration
> > > > > > > > > > > > > error.
> > > > > > > > > > > > > So the follow-up question is: Suppose users encounter
> > > > this
> > > > > issue,
> > > > > > > > > and
> > > > > > > > > > > how
> > > > > > > > > > > > > could they resolve it?
> > > > > > > > > > > > > Uncordon the log dir dynamically using
> > > > > > > > > > > > > kafka-configs.sh?
> > > > > Will the
> > > > > > > > > > > > > uncordoning config change recreate the partitions we
> > > > didn't
> > > > > > > > create
> > > > > > > > > > > earlier
> > > > > > > > > > > > > because of log dir cordoned?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The metadata log is different (not managed by
> > > > > LogManager), so I
> > > > > > > > > think
> > > > > > > > > > > > > it should always be created regardless if its log
> > > > > directory is
> > > > > > > > > > > > > cordoned or not.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I agree we should treat "__cluster_metadata"
> > > > > > > > > > > > > differently.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > Luke
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison <
> > > > > > > > > > > mickael.mai...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Luke,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. isCordoned() is a new method on
> > > > > > > > > > > > > > LogDirDescription.
> > > > It
> > > > > does
> > > > > > > > not
> > > > > > > > > > > take
> > > > > > > > > > > > > > any arguments. It just returns true if this log
> > > > > directory the
> > > > > > > > > > > > > > LogDirDescription represents is cordoned.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. Sorry that was a typo. This method will only
> > > > > > > > > > > > > > return
> > > > a
> > > > > log
> > > > > > > > > > > directory
> > > > > > > > > > > > > > that is not cordoned. Fixed
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the
> > > > > metadata log
> > > > > > > > > via the
> > > > > > > > > > > > > > RegisterBrokerRecord, BrokerRegistrationChangeRecord
> > > > > records.
> > > > > > > > If
> > > > > > > > > a
> > > > > > > > > > > > > > broker is offline, the controller will use the
> > > > > > > > > > > > > > latest
> > > > > known
> > > > > > > > > state of
> > > > > > > > > > > > > > the broker to determine the broker's cordoned log
> > > > > directories.
> > > > > > > > > I've
> > > > > > > > > > > > > > added a sentence clarifying this point.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 5. Yes a log directory can be uncordoned. You can
> > > > either
> > > > > update
> > > > > > > > > the
> > > > > > > > > > > > > > properties file and restart the broker or
> > > > > > > > > > > > > > dynamically
> > > > > change
> > > > > > > > the
> > > > > > > > > > > value
> > > > > > > > > > > > > > at runtime using kafka-configs. I've added a
> > > > > > > > > > > > > > paragraph
> > > > > about it
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario a
> > > > > > > > configuration
> > > > > > > > > > > > > > error. If all log directories are cordoned before
> > > > > > > > > > > > > > the
> > > > > internal
> > > > > > > > > topics
> > > > > > > > > > > > > > are created, then the broker will not be able to
> > > > > > > > > > > > > > create
> > > > > them.
> > > > > > > > > This
> > > > > > > > > > > > > > seems like a pretty strange scenario, where it's the
> > > > > first time
> > > > > > > > > you
> > > > > > > > > > > > > > start a broker, you've cordoned all its log
> > > > > > > > > > > > > > directory,
> > > > > and the
> > > > > > > > > > > > > > internal topics (offsets and transactions) have not
> > > > > > > > > > > > > > yet
> > > > > been
> > > > > > > > > created
> > > > > > > > > > > > > > in the rest of the cluster.
> > > > > > > > > > > > > > The metadata log is different (not managed by
> > > > > LogManager), so I
> > > > > > > > > think
> > > > > > > > > > > > > > it should always be created regardless if its log
> > > > > directory is
> > > > > > > > > > > > > > cordoned or not.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Mickael
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai <
> > > > > > > > > chia7...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > hi Mickael
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That is totally a good idea, but I have a question
> > > > > about the
> > > > > > > > > > > > > > implementation
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Do we consider making pluggable ReplicaPlacer
> > > > > (KIP-660) first
> > > > > > > > > and
> > > > > > > > > > > then
> > > > > > > > > > > > > > add
> > > > > > > > > > > > > > > another impl of ReplicaPlacer to offer cordon
> > > > > mechanism?
> > > > > > > > Noted
> > > > > > > > > that
> > > > > > > > > > > > > > > `ReplicaPlacer` can implement Reconfigurable to
> > > > > > > > > > > > > > > get
> > > > > updated
> > > > > > > > at
> > > > > > > > > > > runtime.
> > > > > > > > > > > > > > > That is similar to KIP-1066 - change
> > > > cordoned.log.dirs
> > > > > > > > through
> > > > > > > > > > > configs
> > > > > > > > > > > > > > > update.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The benefit is to let users have their optimized
> > > > > policy for
> > > > > > > > > > > specific
> > > > > > > > > > > > > > > scenario. Also, it can avoid that we add more and
> > > > more
> > > > > > > > > mechanism
> > > > > > > > > > > to our
> > > > > > > > > > > > > > > code base. Of course we can merge the mechanism
> > > > > > > > > > > > > > > which
> > > > > can be
> > > > > > > > > used
> > > > > > > > > > > by 99%
> > > > > > > > > > > > > > > users :smile
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Chia-Ping
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Luke Chen <show...@gmail.com> 於 2024年7月9日 週二
> > > > 下午9:07寫道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Mickael,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the KIP!
> > > > > > > > > > > > > > > > This is a long waiting feature for many users!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Questions:
> > > > > > > > > > > > > > > > 1. I think piggyback the
> > > > > > > > > > > > > > > > "BrokerHeartbeatRequest"
> > > > to
> > > > > > > > forward
> > > > > > > > > the
> > > > > > > > > > > > > > corden log
> > > > > > > > > > > > > > > > dir to controller makes sense to me.
> > > > > > > > > > > > > > > > We already did similar things for fence,
> > > > > > > > > > > > > > > > controller
> > > > > > > > shutdown,
> > > > > > > > > > > failed
> > > > > > > > > > > > > > log
> > > > > > > > > > > > > > > > dir...etc.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. In the admin API, what parameters will the
> > > > > > > > > > > > > > > > new
> > > > > added
> > > > > > > > > > > isCordoned()
> > > > > > > > > > > > > > method
> > > > > > > > > > > > > > > > take?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3. In the KIP, we said:
> > > > > > > > > > > > > > > > "defaultDir(): This method will not return the
> > > > > > > > > > > > > > > > Uuid
> > > > > of a
> > > > > > > > log
> > > > > > > > > > > directory
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > is not cordoned."
> > > > > > > > > > > > > > > > --> It's hard to understand. Does that mean we
> > > > > > > > > > > > > > > > will
> > > > > only
> > > > > > > > > return
> > > > > > > > > > > > > > cordoned
> > > > > > > > > > > > > > > > log dir?
> > > > > > > > > > > > > > > > From the current java doc of the interface, it
> > > > > doesn't look
> > > > > > > > > > > right:
> > > > > > > > > > > > > > > > "Get the default directory for new partitions
> > > > placed
> > > > > in a
> > > > > > > > > given
> > > > > > > > > > > > > > broker."
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. Currently, if a broker is registered and
> > > > > > > > > > > > > > > > then go
> > > > > > > > offline.
> > > > > > > > > In
> > > > > > > > > > > this
> > > > > > > > > > > > > > state,
> > > > > > > > > > > > > > > > the controller will still distribute partitions
> > > > > > > > > > > > > > > > to
> > > > > this
> > > > > > > > > broker.
> > > > > > > > > > > > > > > > So, if now, the broker get startup with
> > > > > "cordoned.log.dirs"
> > > > > > > > > set,
> > > > > > > > > > > what
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > happen?
> > > > > > > > > > > > > > > > Will the newly assigned partitions be created
> > > > > successfully
> > > > > > > > or
> > > > > > > > > > > not?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 5. I think after a log dir get cordoned, we can
> > > > > always
> > > > > > > > > uncordon
> > > > > > > > > > > it,
> > > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > > I think we should mention it in the KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 6. If a broker is startup with
> > > > > > > > > > > > > > > > "cordoned.log.dirs"
> > > > > set, and
> > > > > > > > > does
> > > > > > > > > > > that
> > > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > > the internal topics partitions (ex:
> > > > > __consumer_offsets)
> > > > > > > > > cannot be
> > > > > > > > > > > > > > created,
> > > > > > > > > > > > > > > > either?
> > > > > > > > > > > > > > > > Also, if this log dir is happen to be the
> > > > > > > > > > > > > > > > metadata
> > > > > log dir,
> > > > > > > > > what
> > > > > > > > > > > will
> > > > > > > > > > > > > > > > happen to the metadata topic creation?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > > Luke
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael Maison <
> > > > > > > > > > > > > > mickael.mai...@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for taking a look.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Yes you're right, I meant
> > > > > AlterPartitionReassignments.
> > > > > > > > > Fixed.
> > > > > > > > > > > > > > > > > - That's a good idea. I was expecting users to
> > > > > discover
> > > > > > > > > > > cordoned log
> > > > > > > > > > > > > > > > > directories by describing broker
> > > > > > > > > > > > > > > > > configurations.
> > > > > But
> > > > > > > > being
> > > > > > > > > > > able to
> > > > > > > > > > > > > > > > > also get this information when describing log
> > > > > directories
> > > > > > > > > makes
> > > > > > > > > > > > > > sense.
> > > > > > > > > > > > > > > > > I've added that to the KIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Mickael
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki Okada <
> > > > > > > > > > > ocadar...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you for the KIP.
> > > > > > > > > > > > > > > > > > The motivation sounds make sense to me.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have a few questions:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - [nits] "AlterPartitions request" in Error
> > > > > handling
> > > > > > > > > section
> > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > "AlterPartitionReassignments request"
> > > > > > > > > > > > > > > > > > actually,
> > > > > right?
> > > > > > > > > > > > > > > > > > - Don't we need to include cordoned
> > > > > > > > > > > > > > > > > > information
> > > > > in
> > > > > > > > > > > DescribeLogDirs
> > > > > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > too? Some tools (e.g. CruiseControl) need to
> > > > > have a way
> > > > > > > > > to
> > > > > > > > > > > know
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > broker/log-dirs are cordoned to generate
> > > > > partition
> > > > > > > > > > > reassignment
> > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison <
> > > > > > > > > mickael.mai...@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'd like to start a discussion on KIP-1066
> > > > that
> > > > > > > > > introduces
> > > > > > > > > > > a
> > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > to cordon log directories and brokers.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > Mickael
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > ========================
> > > > > > > > > > > > > > > > > > Okada Haruki
> > > > > > > > > > > > > > > > > > ocadar...@gmail.com
> > > > > > > > > > > > > > > > > > ========================
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > David Arthur
> > > > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > David Arthur