Sam Lendle created KAFKA-7240:
---------------------------------
Summary: -total metrics in Streams are incorrect
Key: KAFKA-7240
URL: https://issues.apache.org/jira/browse/KAFKA-7240
Project: Kafka
Issue Type: Bug
Components: metrics, streams
Affects Versions: 2.0.0
Reporter: Sam Lendle
I noticed the values of total metrics for streams were decreasing periodically
when viewed in JMX, for example process-total for each processor-node-id under
stream-processor-node-metrics.
Looking at StreamsMetricsThreadImpl, I believe this behavior is due to using
Count() as the Stat for the *-total metrics. Count() is a SampledStat, so the
value it reports is the count in recent time windows, and the value decreases
whenever a window is purged.
----
This explains the behavior I saw, but I think the issue is deeper. For example,
processTimeSensor attempts to measure, process-latency-avg,
process-latency-max, process-rate, and process-total. For that sensor, record
is called like
streamsMetrics.processTimeSensor.record(computeLatency() / (double) processed,
timerStartedMs);
so the value passed to record is average latency per processed message in this
batch if I understand correctly. That gets pushed through to the call to
Count#record, which increments it's count by 1, ignoring the value parameter.
Whatever stat is recording the total would need to know is the number of
messages processed. Because of that, I don't think it's possible for one Sensor
to measure both latency and total.
That said, it's not clear to me how all the different Stats work and how
exactly Sensors work, and I don't actually understand how the process-rate
metric is working for similar reasons but that seems to be correct, so I may be
missing something here.
cc [~guozhang]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)