[
https://issues.apache.org/jira/browse/FLINK-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809002#comment-16809002
]
Shuyi Chen commented on FLINK-11912:
------------------------------------
Hi [~aitozi], the current approach does the following:
1) as the KafkaConsumer discover new partition, it add the partition
information to _manualRegisteredMetricSet_.
2) in the consumer polling run loop, for every iteration/poll, check if there
is any partition not yet registered in _manualRegisteredMetricSet_. If there
are still partitions left, check if the KafkaConsumer has already exposed the
metric for those partitions, and register them with Flink.
In short, the current approach will keep trying to register the partition
metric once a new partition is discovered until the KafkaConsumer expose it.
Therefore, I dont think we will lose partition lag metrics unless there are
bugs with new partition discovery mechanism. What do you think?
> Expose per partition Kafka lag metric in Flink Kafka connector
> --------------------------------------------------------------
>
> Key: FLINK-11912
> URL: https://issues.apache.org/jira/browse/FLINK-11912
> Project: Flink
> Issue Type: New Feature
> Components: Connectors / Kafka
> Affects Versions: 1.6.4, 1.7.2
> Reporter: Shuyi Chen
> Assignee: Shuyi Chen
> Priority: Major
>
> In production, it's important that we expose the Kafka lag by partition
> metric in order for users to diagnose which Kafka partition is lagging.
> However, although the Kafka lag by partition metrics are available in
> KafkaConsumer after 0.10.2, Flink was not able to properly register it
> because the metrics are only available after the consumer start polling data
> from partitions. I would suggest the following fix:
> 1) In KafkaConsumerThread.run(), allocate a manualRegisteredMetricSet.
> 2) in the fetch loop, as KafkaConsumer discovers new partitions, manually add
> MetricName for those partitions that we want to register into
> manualRegisteredMetricSet.
> 3) in the fetch loop, check if manualRegisteredMetricSet is empty. If not,
> try to search for the metrics available in KafkaConsumer, and if found,
> register it and remove the entry from manualRegisteredMetricSet.
> The overhead of the above approach is bounded and only incur when discovering
> new partitions, and registration is done once the KafkaConsumer have the
> metrics exposed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)