Stig Rohde Døssing created KAFKA-17503:
------------------------------------------
Summary: The poll-idle-ratio-avg consumer metric is calculated
incorrectly
Key: KAFKA-17503
URL: https://issues.apache.org/jira/browse/KAFKA-17503
Project: Kafka
Issue Type: Bug
Affects Versions: 3.8.0
Reporter: Stig Rohde Døssing
The poll-idle-ratio-avg metric is supposed to calculate the fraction of time a
thread spends inside KafkaConsumer.poll, indicating that the thread is waiting
for Kafka rather than processing records.
As
[KIP-517|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127406453#KIP517:Addconsumermetricstoobserveuserpollbehavior-poll-idle-ratio-avg]
says, the fraction is supposed to be "time-inside-poll/total-time".
The metric's computation is slightly off, because the total time ends up being
slightly too large, because it actually covers one full poll interval, plus the
time spent in the latest poll call.
To give an example say that a thread consistently spends 10 ms in poll, and 10
ms outside poll. We would expect a fraction of 0.5. But the metric as
implemented calculates something slightly different.
At time 0, we enter the first poll, setting pollStartMs to 0.
At time 10, we exit the first poll, registering a meaningless value with the
metric.
At time 20, we enter the second poll, setting pollStartMs to 20, and setting
timeSinceLastPollMs to 20.
At time 30, we exit the second poll. The calculation we get is then {code}20 /
(10 + 20)=0.66{code}.
This slightly wrong answer is because the divisor covers a larger period of
time than the poll interval.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)