Tzu-Li (Gordon) Tai created FLINK-6109:
------------------------------------------
Summary: Add "consumer lag" report metric to FlinkKafkaConsumer
Key: FLINK-6109
URL: https://issues.apache.org/jira/browse/FLINK-6109
Project: Flink
Issue Type: New Feature
Components: Kafka Connector, Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai
This is a feature discussed in this ML:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Telling-if-a-job-has-caught-up-with-Kafka-td12261.html.
As discussed, we can expose two kinds of "consumer lag" metrics for this:
- "current consumer lag for partition": the current difference between the
latest offset and the last collected record record of a partition. This metric
is calculated and updated at a configurable interval. This metric basically
serves as an indicator of how the consumer is keeping up with the head of
partitions. I propose to name this `currentOffsetLag`.
- "Consumer lag of last checkpoint": the difference between the latest offset
and the offset stored in the checkpoint of a partition. This metric is only
updated when checkpoints are completed. It serves as an indicator of how much
data may need to be replayed in case of a failure. I propose to name this
`lastCheckpointedOffsetLag`.
The granularity of the metric is per-FlinkKafkaConsumer, and independent of the
consumer group.id used (the offset used to calculate consumer lag is the
internal offset state of the FlinkKafkaConsumer, not the consumer group's
committed offsets in Kafka).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)