Philip Nee created KAFKA-18217:
----------------------------------
Summary: Slow HWM/LSO update might have subtle effect on the
consumer lag reporting
Key: KAFKA-18217
URL: https://issues.apache.org/jira/browse/KAFKA-18217
Project: Kafka
Issue Type: Improvement
Components: clients, consumer
Reporter: Philip Nee
We've discovered the consumer lag metrics appear spiky for the
AsyncKafkaConsumer. We examined how HWM/LSO is updated and measure the cadence
between the two consumer using the local examples. TL;DR - Consumer Lag metrics
can sometimes be off due to KAFKA-18216 and slowness of HWM/LSO update.
Context: Fetcher performs multiple consumer lag measurements between two
HWM/LSO updates. The closer the HWM/LSO update, the better the lag measurement
is because
lag = HWM/LSO - fetch position
The elementary statics show the behavioral differences between the 2 consumer
implementations. The data will vary based on the platform running these tests,
so this is just for the reader's reference. (These are the outputs of my custom
script). Both are measuring by produce-consuming 200 million records.
AsyncKafkaConsumer
Updating 7179 HWM/LSO
Average HWM/LSO increment: 3589.99
Standard deviation of increment: 2381.07
Average number of 'recording lag' count: 7.69
Standard deviation of 'recording lag' count: 4.66
ClassicKafkaConsumer
Updating 58418 HWM/LSO
Average HWM/LSO increment 1223.02
Standard deviation of increment: 532.52
Average 'recording lag' count: 2.95
Standard deviation of 'recording lag' count: 1.10
--
This message was sent by Atlassian Jira
(v8.20.10#820010)