We have an agent (tcollector) that collects various metrics and sends them to OpenTSDB. We then have a simple python script that lets you query OpenTSDB for a given metric and alert on a threshold. We currently gather about 1300 metrics and about 10000 datapoints/sec.
https://github.com/stumbleupon/tcollector http://opentsdb.net/nagios.html We're just starting to use Kafka here. I'm not aware yet of a Kafka collector, but I'm sure one will be coming soon. They're really simple to write. You can look at either the elasticsearch collector or the HBase collector as an example of different styles. The HBase one forks a jmx process to read metrics from the running JVM, the elasticsearch one queries the running process for metrics via an ElasticSearch endpoint. --Dave On Mon, Apr 9, 2012 at 12:30 AM, liu brent <liubaoc...@gmail.com> wrote: > Hi, > Currently, there are 20+ kinds of web service running on hundreds of > machines in 4 data centers, which are written in java. We want to collect > and store the statics of JVM, MBeans of JMX, cpu usage etc of the service > and machines, and then send the waring to Nagios. Does anyone has the > experience in using Kafka to do so? > The direct solution seems like that we install a client on every machine, > which works as JMX client, collect the information and work as kafka > producer. Anyone has better ideas or suggestions. > > Thanks, > Liu