[ 
https://issues.apache.org/jira/browse/KAFKA-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-17503:
------------------------------------
    Component/s: clients
                 consumer

> The poll-idle-ratio-avg consumer metric is calculated incorrectly
> -----------------------------------------------------------------
>
>                 Key: KAFKA-17503
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17503
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 3.8.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Minor
>
> The poll-idle-ratio-avg metric is supposed to calculate the fraction of time 
> a thread spends inside KafkaConsumer.poll, indicating that the thread is 
> waiting for Kafka rather than processing records.
> As 
> [KIP-517|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127406453#KIP517:Addconsumermetricstoobserveuserpollbehavior-poll-idle-ratio-avg]
>  says, the fraction is supposed to be "time-inside-poll/total-time".
> The metric's computation is slightly off, because the total time ends up 
> being slightly too large, because it actually covers one full poll interval, 
> plus the time spent in the latest poll call.
> To give an example say that a thread consistently spends 10 ms in poll, and 
> 10 ms outside poll. We would expect a fraction of 0.5. But the metric as 
> implemented calculates something slightly different.
> At time 0, we enter the first poll, setting pollStartMs to 0. 
> At time 10, we exit the first poll, registering a meaningless value with the 
> metric.
> At time 20, we enter the second poll, setting pollStartMs to 20, and setting 
> timeSinceLastPollMs to 20.
> At time 30, we exit the second poll. The calculation we get is 
> {code}pollTimeMs / (pollTimeMs + timeSinceLastPollMs){code} which in this 
> case is {code}10 / (10 + 20)=0.33{code}
> This slightly wrong answer is because the divisor covers a larger period of 
> time than the poll interval.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to