li xiangyuan created KAFKA-14914:
------------------------------------
Summary: binarySearch in AbstactIndex may execute with infinite
loop
Key: KAFKA-14914
URL: https://issues.apache.org/jira/browse/KAFKA-14914
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.4.0
Reporter: li xiangyuan
Attachments: stack.1.txt, stack.2.txt, stack.3.txt
Recently our servers in production environment may suddenly stop handle request
frequently(for now 3 times in less than 10 days), please check the stack file
uploaded, it show that 1 ioThread(data-plane-kafka-request-handler-11) hold
the ReadLock of Partition's leaderIsrUpdateLock and keep run the binarySearch
function, once any thread(kafka-scheduler-2) need WriteMode Of this lock, then
all requests read this partition need ReadMode Lock will use out all ioThreads
and then this broker couldn't handle any request.
the 3 stack files are fetched with interval about 6 minute, with my standpoint
i just could think obviously the binarySearch function cause dead lock and I
presuppose maybe the index block values in offsetIndex (at least in mmap) are
not sorted.
detail information:
this problem appear in 2 brokers
broker version: 2.4.0
jvm: openjdk 11
hardware: aws c7g 4xlarge, this is a arm64 server, we recently upgrade our
servers from c6g 4xlarge to this type, when we use c6g haven't meet this
problem, we don't know whether arm or aws c7g server have any problem.
other: once we restart broker, it will recover, so we doubt offset index file
may not corrupted and maybe something wrong with mmap.
plz give any suggestion solve this problem, thx.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)