Log runtime stats for analysis by ops
-------------------------------------
Key: CASSANDRA-36
URL: https://issues.apache.org/jira/browse/CASSANDRA-36
Project: Cassandra
Issue Type: Improvement
Reporter: Jonathan Ellis
We need to log stats that indicate node / cluster health. Some that I can
think of are:
- executor pool pending operation (queue) length [memtable per CF,
memtablemanager, storageservice.bootstraper, storageservice.consistencymanager]
- memtable size (which is more useful: per CF or total?)
- sstable size, per CF
- number of unmerged SSTables, per CF
- size of sstable indexes (this is the other major semi-permanent memory chunk)
- writes, reads per second (throughput)
- average seconds per write / read (latency)
- percent of reads that have to hit a SSTable (we don't know if this is in the
OS cache or not, so is this actually useful?)
- commitlog on-disk size (want to make sure these are getting cleaned out
regularly)
Currently some of these are logged in an ad-hoc manner, e.g. time to read in
ReadVerbHandler, but aggregation is not done and logs on a per-op basis are
going to get quite spammy. I'd like one thread to be in charge of logging, to
dump aggregate data (at level INFO) every minute or so.
Might also be nice to expose this on the web console.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.