[ 
https://issues.apache.org/jira/browse/KAFKA-15490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Dupriez updated KAFKA-15490:
--------------------------------------
    Description: 
There is a small bug/typo in the handling of I/O error when writing broker 
metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir 
failure channel is the full path of the checkpoint file whereas only the log 
directory is expected 
([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]).
{code:java}
case e: IOException =>
   val dirPath = checkpoint.file.getAbsolutePath
   logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing 
meta.properties to $dirPath", e){code}
As a result, after an {{IOException}} is captured and enqueued in the log dir 
failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of the 
log directory):
{code:java}
[2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to 
<logDir>/meta.properties (kafka.server.LogDirFailureChannel) 
java.io.IOException{code}
The log dir failure handler cannot lookup the log directory:
{code:java}
[2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to 
(kafka.server.ReplicaManager$LogDirFailureHandler) 
org.apache.kafka.common.errors.LogDirNotFoundException: Log dir 
<logDir>/meta.properties is not found in the config.{code}
An immediate fix for this is to use the {{logDir}} provided from to the 
checkpointing method instead of the path of the metadata file.

For brokers with only one log directory, this bug will result in preventing the 
broker from shutting down as expected.

The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler 
thread, and subsequent {{IOException}} are not handled, and the broker never 
stops.

  was:
There is a small bug/typo in the handling of I/O error when writing broker 
metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir 
failure channel is the full path of the checkpoint file whereas only the log 
directory is expected 
([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]).
{code:java}
case e: IOException =>
   val dirPath = checkpoint.file.getAbsolutePath
   logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing 
meta.properties to $dirPath", e){code}
As a result, after an {{IOException}} is captured and enqueued in the log dir 
failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of the 
log directory):
{code:java}
[2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to 
<logDir>/meta.properties (kafka.server.LogDirFailureChannel) 
java.io.IOException{code}
The log dir failure handler cannot lookup the log directory:
{code:java}
[2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to 
(kafka.server.ReplicaManager$LogDirFailureHandler) 
org.apache.kafka.common.errors.LogDirNotFoundException: Log dir 
<logDir>/meta.properties is not found in the config.{code}
An immediate fix for this is to use the {{logDir}} provided from to the 
checkpointing method instead of the path of the metadata file.

For brokers with only one log directory, this bug will result in preventing the 
broker from shutting down as expected.

The `{{{}ogDirNotFoundException{}}} then kills the log dir failure handler 
thread, and subsequent {{IOException}} are not handled, and the broker never 
stops.


> Invalid path provided to the log failure channel upon I/O error when writing 
> broker metadata checkpoint
> -------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15490
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15490
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.5.1
>            Reporter: Alexandre Dupriez
>            Assignee: Alexandre Dupriez
>            Priority: Minor
>
> There is a small bug/typo in the handling of I/O error when writing broker 
> metadata checkpoint in {{{}KafkaServer{}}}. The path provided to the log dir 
> failure channel is the full path of the checkpoint file whereas only the log 
> directory is expected 
> ([source|https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/server/KafkaServer.scala#L958C8-L961C8]).
> {code:java}
> case e: IOException =>
>    val dirPath = checkpoint.file.getAbsolutePath
>    logDirFailureChannel.maybeAddOfflineLogDir(dirPath, s"Error while writing 
> meta.properties to $dirPath", e){code}
> As a result, after an {{IOException}} is captured and enqueued in the log dir 
> failure channel ({{{}<logDir>{}}} is to be replaced with the actual path of 
> the log directory):
> {code:java}
> [2023-09-22 17:07:32,052] ERROR Error while writing meta.properties to 
> <logDir>/meta.properties (kafka.server.LogDirFailureChannel) 
> java.io.IOException{code}
> The log dir failure handler cannot lookup the log directory:
> {code:java}
> [2023-09-22 17:07:32,053] ERROR [LogDirFailureHandler]: Error due to 
> (kafka.server.ReplicaManager$LogDirFailureHandler) 
> org.apache.kafka.common.errors.LogDirNotFoundException: Log dir 
> <logDir>/meta.properties is not found in the config.{code}
> An immediate fix for this is to use the {{logDir}} provided from to the 
> checkpointing method instead of the path of the metadata file.
> For brokers with only one log directory, this bug will result in preventing 
> the broker from shutting down as expected.
> The L{{{}ogDirNotFoundException{}}} then kills the log dir failure handler 
> thread, and subsequent {{IOException}} are not handled, and the broker never 
> stops.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to