[ https://issues.apache.org/jira/browse/KAFKA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443283#comment-13443283 ]
Jun Rao commented on KAFKA-203: ------------------------------- I propose that we add/keep the following set of metrics. Anything missed? Server side: A. Requests: A1. produceRequestRate (meter, total) A2. fetchRequestRate (meter, follower/non-follower) A3. getMetadataRate (meter, total) A4. getOffsetRate (meter, total) A5. leaderAndISRRate (meter, total) A6. stopReplicaRate (meter, total) A7. produceRequestSizeHist (hist, total) A8. fetchResponseSizeHist (hist, total) A9. produceFailureRate (meter, topic/total) A10. fetchFailureRate (meter, topic/total) A11. produceRequestTime (timer, total) A12. fetchRequestTime (timer, total) A13. messagesInRate (meter, topic/total) A14. messagesOutRate (meter, topic/total) A15. messagesBytesInRate (meter, topic/total) A16. messagesBytesOutRate (meter, topic/total) B. Log: B1. logFlushTime (timer, total) C. Purgatory: Produce: C1. expiredRequestMeter (meter, partition/total) C2. satisfactionTimeHist (hist, total) Fetch: C3. expiredRequestMeter (meter, follower/non-follower) C4. satisfactionTimeHist (hist, follower/non-follower) Both: C5. delayedRequests (gauge, Fetch/Produce) D. ReplicaManager: D1. leaderPartitionCounts (gauge, total) D2. underReplicatedPartitionCounts (|ISR| < replication factor, gauge, total) D3. ISRExpandRate (meter, partition/total) D4. ISRShrinkRate (meter, partition/total) E. Controller: E1. requestRate (meter, total) E2. requestTimeHist (hist, total) E3. controllerActiveCount (gauge, total) Clients: F. Producer: F1. messageRate (meter, topic/total) F2. byteRate (meter, topic/total) F3. droppedEventRate (meter, total) F4. requestRate (meter, total) F5. requestSizeHist (hist, total) F6. requestTimeHist (hist, total) F7. resendRate (meter, total) F8. failedSendRate (meter, total) F9. getMetadataRate (meter, total) G. Consumer: G1. messageRate (meter, topic/total) G2. byteRate (meter, topic/total) G3. requestRate (meter, total) G4. requestSizeHist (hist, total) G5. requestTimeHist (hist, total) G6. lagInBytes (gauge, partition) Also, I propose that we remove the following metrics since they are either not very useful or are redundant. Purgatory: Produce: * caughtUpFollowerFetchRequest (meter, partition/total): not very useful * followerCatchupTime (hist, total): not very useful * throughputMeter (meter, partition/total): same as bytesIn * satisfiedRequestMeter (meter, total): not very useful Fetch: * satisfiedRequestMeter (meter, total): not very useful * throughputMeter (meter, partition/total): same as bytesOut Both * satisfactionRate (meter, Fetch/Produce): not very useful * expirationRate (meter, Fetch/Produce/topic): already at Produce/Fetch leve > Improve Kafka internal metrics > ------------------------------ > > Key: KAFKA-203 > URL: https://issues.apache.org/jira/browse/KAFKA-203 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Jay Kreps > Labels: tools > > Currently metrics in kafka are using old-school JMX directly. This makes > adding metrics a pain. It would be good to do one of the following: > 1. Convert to Coda Hale's metrics package > (https://github.com/codahale/metrics) > 2. Write a simple metrics package > The new metrics package should make metrics easier to add and work with and > package up the common logic of keeping windowed gauges, histograms, counters, > etc. JMX should be just one output of this. > The advantage of the Coda Hale package is that it exists so we don't need to > write it. The downsides are (1) introduces another client dependency which > causes conflicts, and (2) seems a bit heavy on design. The good news is that > the metrics-core package doesn't seem to bring in a lot of dependencies which > is nice, though the scala wrapper seems to want scala 2.9. I am also a little > skeptical of the approach for histograms--it does sampling instead of > bucketing though that may be okay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira