I use the standard checks check that the process is running. A check in zookeeper that checks for correct partition ownage and number of registered brokers / consumers /producers. Collectd runs on all my machines and pushes out jmx metrics out to graphite. I then use check-graphite which allows checking for consumer lag.
On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > LinkedIn has a custom monitoring system partially described here: > http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection > > The integration from the kafka side is basically just jmx, though we have a > few wrappers that expose additional things. We measure basic stuff like > disk stats, messages/sec, latency, etc. > > In addition we due a very kafka specific kind of monitoring we call > "audit". This counts the number of messages sent by every producer, > received by every broker, and received by every consumer and reconciles and > graphs and alerts on these counts. This is very helpful in determining that > all the sent data arrived at its destination. There is a bug open to open > source this piece, though it has a few dependencies. > > https://issues.apache.org/jira/browse/KAFKA-260 > > -Jay > > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jcre...@box.com> wrote: > >> How do you guys monitor Kafka? Do any of you have Nagios checks that you >> use? What metrics do you find important? >>