[
https://issues.apache.org/jira/browse/KAFKA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779956#comment-17779956
]
Haruki Okada commented on KAFKA-15688:
--------------------------------------
> Is it possible to add such a feature to Kafka so that it shuts down in this
> case as well?
That could be tricky to implement at Kafka level since to make disk IO timeout
in case of device hung, a kind of timer has to be set on another thread for
every IO. (Because the thread executing I/O can do nothing for timeout)
I guess there are several options to address the issue:
1) Set I/O timeout at OS/device level to cause IOException (which causes Kafka
to stop) at Kafka level on disk hung
2) Deploy another process to watch disk health and let it kill Kafka on disk
hung
For either solutions, a concern is, when a broker is unable to process requests
due to disk hung (without leadership change), the broker may kick out other
followers from ISR set unexpectedly (since it can't handle Fetch requests so
can't increment HW) before it got killed.
In this case, the broker could be the last ISR so stopping it may cause the
partition to be offline, which needs unclean leader election.
[KIP-966|https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas]
could be the solution for this problem though.
Apart from above, [https://github.com/apache/kafka/pull/14242] could mitigate
your issue I guess.
The thing is, even when disk got hung, produce shouldn't be disrupted because
Kafka doesn't wait IOs for log-append to be synched to the device. (unless too
many dirty pages accumulate)
However, as of Kafka 3.3.2, there are several paths which calls fsync on
log-roll with holding UnifiedLog#lock. Due to this, if disk hungs during doing
fsync, UnifiedLog#lock will be held for long time and all subsequent requests
against same parittion may be blocked in the meantime.
Actually, we encountered similar issue on our on-prem Kafka which consists of a
lot of HDDs that some HDD got broken on a daily basis.
The frequency of the issue is mitigated by the above patch indeed.
> Partition leader election not running when disk IO hangs
> --------------------------------------------------------
>
> Key: KAFKA-15688
> URL: https://issues.apache.org/jira/browse/KAFKA-15688
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 3.3.2
> Reporter: Peter Sinoros-Szabo
> Priority: Major
>
> We run our Kafka brokers on AWS EC2 nodes using AWS EBS as disk to store the
> messages.
> Recently we had an issue when the EBS disk IO just stalled so Kafka was not
> able to write or read anything from the disk, well except the data that was
> still in page cache or that still fitted into the page cache before it is
> synced to EBS.
> We experienced this issue in a few cases: sometimes partition leaders were
> moved away to other brokers automatically, in other cases that didn't happen
> and caused the Producers to fail producing messages to that broker.
> My expectation from Kafka in such a case would be that it notices it and
> moves the leaders to other brokers where the partition has in sync replicas,
> but as I mentioned this didn't happen always.
> I know Kafka will shut itself down in case it can't write to its disk, that
> might be a good solution in this case as well as it would trigger the leader
> election automatically.
> Is it possible to add such a feature to Kafka so that it shuts down in this
> case as well?
> I guess similar issue might happen with other disk subsystems too or even
> with a broken and slow disk.
> This scenario can be easily reproduced using AWS FIS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)