I use the standard checks check that the process is running. A check
in zookeeper that checks for correct partition ownage and number of
registered brokers / consumers /producers.
Collectd runs on all my machines and pushes out jmx metrics out to
graphite. I then use check-graphite which allows checking for consumer
lag.

On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> LinkedIn has a custom monitoring system partially described here:
> http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
>
> The integration from the kafka side is basically just jmx, though we have a
> few wrappers that expose additional things. We measure basic stuff like
> disk stats, messages/sec, latency, etc.
>
> In addition we due a very kafka specific kind of monitoring we call
> "audit". This counts the number of messages sent by every producer,
> received by every broker, and received by every consumer and reconciles and
> graphs and alerts on these counts. This is very helpful in determining that
> all the sent data arrived at its destination. There is a bug open to open
> source this piece, though it has a few dependencies.
>
> https://issues.apache.org/jira/browse/KAFKA-260
>
> -Jay
>
> On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jcre...@box.com> wrote:
>
>> How do you guys monitor Kafka? Do any of you have Nagios checks that you
>> use? What metrics do you find important?
>>

Reply via email to