[
https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219491#comment-17219491
]
Cheng Tan edited comment on KAFKA-8733 at 10/23/20, 6:22 AM:
-------------------------------------------------------------
[~flavr]
What would happen if you increase the lag timeout to a fairly large value so
that the leaders do not remove replicas from the ISR so aggressively? I know
this may seem like a simple fix but we are trying to evaluate.
was (Author: d8tltanc):
[~flavr] increasing the lag timeout to a fairly large value so that the
leaders do not remove replicas from the ISR so aggressively
> Offline partitions occur when leader's disk is slow in reads while responding
> to follower fetch requests.
> ---------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-8733
> URL: https://issues.apache.org/jira/browse/KAFKA-8733
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 1.1.2, 2.4.0
> Reporter: Satish Duggana
> Assignee: Satish Duggana
> Priority: Critical
> Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our
> clusters. After going through the broker logs and hosts’s disk stats, it
> looks like this issue occurs whenever the read/write operations take more
> time on that disk. In a particular case where read time is more than the
> replica.lag.time.max.ms, follower replicas will be out of sync as their
> earlier fetch requests are stuck while reading the local log and their fetch
> status is not yet updated as mentioned in the below code of `ReplicaManager`.
> If there is an issue in reading the data from the log for a duration more
> than replica.lag.time.max.ms then all the replicas will be out of sync and
> partition becomes offline if min.isr.replicas > 1 and unclean.leader.election
> is false.
>
> {code:java}
> def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
> val result = readFromLocalLog( // this call took more than
> `replica.lag.time.max.ms`
> replicaId = replicaId,
> fetchOnlyFromLeader = fetchOnlyFromLeader,
> readOnlyCommitted = fetchOnlyCommitted,
> fetchMaxBytes = fetchMaxBytes,
> hardMaxBytesLimit = hardMaxBytesLimit,
> readPartitionInfo = fetchInfos,
> quota = quota,
> isolationLevel = isolationLevel)
> if (isFromFollower) updateFollowerLogReadResults(replicaId, result). //
> fetch time gets updated here, but mayBeShrinkIsr should have been already
> called and the replica is removed from isr
> else result
> }
> val logReadResults = readFromLog()
> {code}
> Attached the graphs of disk weighted io time stats when this issue occurred.
> I will raise [KIP-501|https://s.apache.org/jhbpn] describing options on how
> to handle this scenario.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)