[ 
https://issues.apache.org/jira/browse/KAFKA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443283#comment-13443283
 ] 

Jun Rao commented on KAFKA-203:
-------------------------------

I propose that we add/keep the following set of metrics. Anything missed?

Server side:
A. Requests:
A1. produceRequestRate (meter, total)
A2. fetchRequestRate (meter, follower/non-follower)
A3. getMetadataRate (meter, total)
A4. getOffsetRate (meter, total)
A5. leaderAndISRRate (meter, total)
A6. stopReplicaRate (meter, total)
A7. produceRequestSizeHist (hist, total)
A8. fetchResponseSizeHist (hist, total)
A9. produceFailureRate (meter, topic/total)
A10. fetchFailureRate (meter, topic/total)
A11. produceRequestTime (timer, total)
A12. fetchRequestTime (timer, total)
A13. messagesInRate (meter, topic/total)
A14. messagesOutRate (meter, topic/total)
A15. messagesBytesInRate (meter, topic/total)
A16. messagesBytesOutRate (meter, topic/total)

B. Log:
B1. logFlushTime (timer, total)

C. Purgatory:
Produce:
C1. expiredRequestMeter (meter, partition/total)
C2. satisfactionTimeHist (hist, total)

Fetch:
C3. expiredRequestMeter (meter, follower/non-follower)
C4. satisfactionTimeHist (hist, follower/non-follower)

Both:
C5. delayedRequests (gauge, Fetch/Produce)

D. ReplicaManager:
D1. leaderPartitionCounts (gauge, total)
D2. underReplicatedPartitionCounts (|ISR| < replication factor, gauge, total)
D3. ISRExpandRate (meter, partition/total)
D4. ISRShrinkRate (meter, partition/total)

E. Controller:
E1. requestRate (meter, total)
E2. requestTimeHist (hist, total)
E3. controllerActiveCount (gauge, total)

Clients:
F. Producer:
F1. messageRate (meter, topic/total)
F2. byteRate (meter, topic/total)
F3. droppedEventRate (meter, total)
F4. requestRate (meter, total)
F5. requestSizeHist (hist, total)
F6. requestTimeHist (hist, total)
F7. resendRate (meter, total)
F8. failedSendRate (meter, total)
F9. getMetadataRate (meter, total) 

G. Consumer:
G1. messageRate (meter, topic/total)
G2. byteRate (meter, topic/total)
G3. requestRate (meter, total)
G4. requestSizeHist (hist, total)
G5. requestTimeHist (hist, total)
G6. lagInBytes (gauge, partition)

Also, I propose that we remove the following metrics since they are either not 
very useful or are redundant.
Purgatory:
Produce:
* caughtUpFollowerFetchRequest (meter, partition/total): not very useful
* followerCatchupTime (hist, total): not very useful
* throughputMeter (meter, partition/total): same as bytesIn
* satisfiedRequestMeter (meter, total): not very useful

Fetch:
* satisfiedRequestMeter (meter, total): not very useful
* throughputMeter (meter, partition/total): same as bytesOut

Both
* satisfactionRate (meter, Fetch/Produce): not very useful
* expirationRate (meter, Fetch/Produce/topic): already at Produce/Fetch leve

                
> Improve Kafka internal metrics
> ------------------------------
>
>                 Key: KAFKA-203
>                 URL: https://issues.apache.org/jira/browse/KAFKA-203
>             Project: Kafka
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>              Labels: tools
>
> Currently metrics in kafka are using old-school JMX directly. This makes 
> adding metrics a pain. It would be good to do one of the following:
> 1. Convert to Coda Hale's metrics package 
> (https://github.com/codahale/metrics)
> 2. Write a simple metrics package
> The new metrics package should make metrics easier to add and work with and 
> package up the common logic of keeping windowed gauges, histograms, counters, 
> etc. JMX should be just one output of this.
> The advantage of the Coda Hale package is that it exists so we don't need to 
> write it. The downsides are (1) introduces another client dependency which 
> causes conflicts, and (2) seems a bit heavy on design. The good news is that 
> the metrics-core package doesn't seem to bring in a lot of dependencies which 
> is nice, though the scala wrapper seems to want scala 2.9. I am also a little 
> skeptical of the approach for histograms--it does sampling instead of 
> bucketing though that may be okay.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to