[ https://issues.apache.org/jira/browse/KAFKA-15490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandre Dupriez updated KAFKA-15490: -------------------------------------- Description: There is a small bug/typo in the handling of I/O error when writing broker metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir failure channel is the full path of the checkpoint file whereas only the log directory is expected ([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]). {code:java} case e: IOException => val dirPath = checkpoint.file.getAbsolutePath logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing meta.properties to $dirPath", e){code} As a result, after an {{IOException}} is captured and enqueued in the log dir failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of the log directory): {code:java} [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to <logDir>/meta.properties (kafka.server.LogDirFailureChannel) java.io.IOException{code} The log dir failure handler cannot lookup the log directory: {code:java} [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to (kafka.server.ReplicaManager$LogDirFailureHandler) org.apache.kafka.common.errors.LogDirNotFoundException: Log dir <logDir>/meta.properties is not found in the config.{code} An immediate fix for this is to use the {{logDir}} provided from to the checkpointing method instead of the path of the metadata file. For brokers with only one log directory, this bug will result in preventing the broker from shutting down as expected. The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler thread, and subsequent {{IOException}} are not handled, and the broker never stops. {code:java} [2024-02-27 02:13:13,564] INFO [LogDirFailureHandler]: Stopped (kafka.server.ReplicaManager$LogDirFailureHandler){code} Another consideration here is whether the {{LogDirNotFoundException}} should terminate the log dir failure handler thread. was: There is a small bug/typo in the handling of I/O error when writing broker metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir failure channel is the full path of the checkpoint file whereas only the log directory is expected ([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]). {code:java} case e: IOException => val dirPath = checkpoint.file.getAbsolutePath logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing meta.properties to $dirPath", e){code} As a result, after an {{IOException}} is captured and enqueued in the log dir failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of the log directory): {code:java} [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to <logDir>/meta.properties (kafka.server.LogDirFailureChannel) java.io.IOException{code} The log dir failure handler cannot lookup the log directory: {code:java} [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to (kafka.server.ReplicaManager$LogDirFailureHandler) org.apache.kafka.common.errors.LogDirNotFoundException: Log dir <logDir>/meta.properties is not found in the config.{code} An immediate fix for this is to use the {{logDir}} provided from to the checkpointing method instead of the path of the metadata file. For brokers with only one log directory, this bug will result in preventing the broker from shutting down as expected. The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler thread, and subsequent {{IOException}} are not handled, and the broker never stops. {code:java} [2024-02-27 02:13:13,564] INFO [LogDirFailureHandler]: Stopped (kafka.server.ReplicaManager$LogDirFailureHandler){code} > Invalid path provided to the log failure channel upon I/O error when writing > broker metadata checkpoint > ------------------------------------------------------------------------------------------------------- > > Key: KAFKA-15490 > URL: https://issues.apache.org/jira/browse/KAFKA-15490 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.5.1 > Reporter: Alexandre Dupriez > Priority: Minor > > There is a small bug/typo in the handling of I/O error when writing broker > metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir > failure channel is the full path of the checkpoint file whereas only the log > directory is expected > ([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]). > {code:java} > case e: IOException => > val dirPath = checkpoint.file.getAbsolutePath > logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing > meta.properties to $dirPath", e){code} > As a result, after an {{IOException}} is captured and enqueued in the log dir > failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of > the log directory): > {code:java} > [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to > <logDir>/meta.properties (kafka.server.LogDirFailureChannel) > java.io.IOException{code} > The log dir failure handler cannot lookup the log directory: > {code:java} > [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to > (kafka.server.ReplicaManager$LogDirFailureHandler) > org.apache.kafka.common.errors.LogDirNotFoundException: Log dir > <logDir>/meta.properties is not found in the config.{code} > An immediate fix for this is to use the {{logDir}} provided from to the > checkpointing method instead of the path of the metadata file. > For brokers with only one log directory, this bug will result in preventing > the broker from shutting down as expected. > The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler > thread, and subsequent {{IOException}} are not handled, and the broker never > stops. > {code:java} > [2024-02-27 02:13:13,564] INFO [LogDirFailureHandler]: Stopped > (kafka.server.ReplicaManager$LogDirFailureHandler){code} > Another consideration here is whether the {{LogDirNotFoundException}} should > terminate the log dir failure handler thread. -- This message was sent by Atlassian Jira (v8.20.10#820010)