[ https://issues.apache.org/jira/browse/KAFKA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450106#comment-13450106 ]
Neha Narkhede commented on KAFKA-203: ------------------------------------- It's great to see a patch that fixes metrics. 1. Partition In the isUnderReplicated API, shouldn't the in sync replicas size be compared to the replication factor for that partition and not the default replciation factor ? 2. ZookeeperConsumerConnector There are a bunch of interesting metrics here that are very useful while troubleshooting. For example, 2.1 Queue size per topic and consumer thread id: When the consumer client's processing slows down, the consumer's queues back up. Currently, to troubleshoot this issue, we need to take thread dumps. If we had the right monitoring on the consumer, we could just look at the metrics to figure out the problem. 2.2 Fetch requests per second per fetcher: Useful to know the progress of the fetcher thread. In this, the bean might probably be named after the broker id that the fetcher is connected to, somewhere along the lines of per key purgatory metrics. 3. KafkaController 3.1 We need a way to tell if a partition is offline. If all replicas of a partition go offline, no leader can be elected for that partition and an alert would have to be raised. 3.2 We also need to be able to measure - 3.2.1 leader election latency 3.2.2 Leader election rate 4. ReplicaManager 4.1 Rename ISRExpandRate to isrExpandRate 4.2 Rename ISRShrinkRate to isrShrinkRate 4.3 I'm not sure how useful it is to have a count for leaders and under replicated partitions. We however, do need a per partition status that tells if the partition is offline or under replicated. 5. TopicMetadataTest Wrap the long line 6. system_test We have to add the new metrics to the metrics.json file to that we can view the metrics on every test run. Not sure if you want to push that to a separate JIRA or not ? > Improve Kafka internal metrics > ------------------------------ > > Key: KAFKA-203 > URL: https://issues.apache.org/jira/browse/KAFKA-203 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Jun Rao > Labels: tools > Attachments: kafka-203_v1.patch > > > Currently metrics in kafka are using old-school JMX directly. This makes > adding metrics a pain. It would be good to do one of the following: > 1. Convert to Coda Hale's metrics package > (https://github.com/codahale/metrics) > 2. Write a simple metrics package > The new metrics package should make metrics easier to add and work with and > package up the common logic of keeping windowed gauges, histograms, counters, > etc. JMX should be just one output of this. > The advantage of the Coda Hale package is that it exists so we don't need to > write it. The downsides are (1) introduces another client dependency which > causes conflicts, and (2) seems a bit heavy on design. The good news is that > the metrics-core package doesn't seem to bring in a lot of dependencies which > is nice, though the scala wrapper seems to want scala 2.9. I am also a little > skeptical of the approach for histograms--it does sampling instead of > bucketing though that may be okay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira