Log runtime stats for analysis by ops
-------------------------------------

                 Key: CASSANDRA-36
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-36
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jonathan Ellis


We need to log stats that indicate node / cluster health.  Some that I can 
think of are:

 - executor pool pending operation (queue) length [memtable per CF, 
memtablemanager, storageservice.bootstraper, storageservice.consistencymanager]
 - memtable size (which is more useful: per CF or total?)
 - sstable size, per CF
 - number of unmerged SSTables, per CF
 - size of sstable indexes (this is the other major semi-permanent memory chunk)
 - writes, reads per second (throughput)
 - average seconds per write / read (latency)
 - percent of reads that have to hit a SSTable (we don't know if this is in the 
OS cache or not, so is this actually useful?)
 - commitlog on-disk size (want to make sure these are getting cleaned out 
regularly)

Currently some of these are logged in an ad-hoc manner, e.g. time to read in 
ReadVerbHandler, but aggregation is not done and logs on a per-op basis are 
going to get quite spammy.  I'd like one thread to be in charge of logging, to 
dump aggregate data (at level INFO) every minute or so.

Might also be nice to expose this on the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to