On 5/4/2010 7:21 AM, Eric Evans wrote:
On Tue, 2010-05-04 at 08:41 +0300, Ran Tavory wrote:
How about the following compromise:
Add a simple web server to each node with only one simple servlet that
simply spits out all JMX stats on one page. Not fancy, no graphs,
simply the same values you can get from jconsole, but on a web page.
To me it seems like a fair tradeoff b/w maintenance and easier out of
the box management. Shooting up jconsole for each server is
cumbersome, at least in the environment I work in (firewalls, high
latency etc) so a web interface can be nice.
It still seems superfluous to me, but I'd be open to something
fire-and-forget (i.e. wouldn't need updating each time something new was
added).
This is how we monitor our Cassandra clusters. Each Cassandra node runs
a process that polls the JMX stats and then fires off events to a set of
configured management nodes using either UDP or multicast, depending on
the network. New Cassandra nodes in the same cluster and datacenter
have the same config (and are configured centrally anyways), and the
management nodes automatically add new nodes based on the events they
receive, so all the graphs, dashboards, monitors, and downstream tools
pick all of this up without needing a change. This way we don't need to
fire up jconsole for hundreds of nodes and can do other interesting
cluster-wide aggregations. Also, we don't have to remember to setup
monitoring when the cluster grows.
All the tools used are open source, and I'd be happy to share more
detail if there is interest.