Stig Rohde Døssing created KAFKA-17503:
------------------------------------------

             Summary: The poll-idle-ratio-avg consumer metric is calculated 
incorrectly
                 Key: KAFKA-17503
                 URL: https://issues.apache.org/jira/browse/KAFKA-17503
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.8.0
            Reporter: Stig Rohde Døssing


The poll-idle-ratio-avg metric is supposed to calculate the fraction of time a 
thread spends inside KafkaConsumer.poll, indicating that the thread is waiting 
for Kafka rather than processing records.



As 
[KIP-517|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127406453#KIP517:Addconsumermetricstoobserveuserpollbehavior-poll-idle-ratio-avg]
 says, the fraction is supposed to be "time-inside-poll/total-time".

The metric's computation is slightly off, because the total time ends up being 
slightly too large, because it actually covers one full poll interval, plus the 
time spent in the latest poll call.

To give an example say that a thread consistently spends 10 ms in poll, and 10 
ms outside poll. We would expect a fraction of 0.5. But the metric as 
implemented calculates something slightly different.

At time 0, we enter the first poll, setting pollStartMs to 0. 
At time 10, we exit the first poll, registering a meaningless value with the 
metric.
At time 20, we enter the second poll, setting pollStartMs to 20, and setting 
timeSinceLastPollMs to 20.
At time 30, we exit the second poll. The calculation we get is then {code}20 / 
(10 + 20)=0.66{code}.

This slightly wrong answer is because the divisor covers a larger period of 
time than the poll interval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to