[Cassandra Wiki] Update of "NodeTool" by IanDanforth

Apache Wiki Wed, 10 Aug 2011 10:23:19 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "NodeTool" page has been changed by IanDanforth:
http://wiki.apache.org/cassandra/NodeTool?action=diff&rev1=19&rev2=20

  == Scrub ==
  Cassandra v0.7.1 and v0.7.2 shipped with a bug that caused incorrect 
row-level bloom filters to be generated when compacting sstables generated with 
earlier versions.  This would manifest in IOExceptions during column name-based 
queries.  v0.7.3 provides "nodetool scrub" to rebuild sstables with correct 
bloom filters, with no data lost. (If your cluster was never on 0.7.0 or 
earlier, you don't have to worry about this.)  Note that nodetool scrub will 
snapshot your data files before rebuilding, just in case.
  
+ == Cfhistograms ==
+ 
+ Excellent description from: 
http://narendrasharma.blogspot.com/2011/04/cassandra-07x-understanding-output-of.html
+ 
+ The output of the command has following 6 columns:
+ 
+  * Offset
+  * SSTables
+  * Write Latency
+  * Read Latency
+  * Row Size
+  * Column Count
+ 
+ === Interpreting the output ===
+ 
+  * Offset: This represents the series of values to which the counts for below 
5 columns correspond. This corresponds to the X axis values in histograms. The 
unit is determined based on the other columns.
+  * SSTables: This represents the number of SSTables accessed per read. For eg 
if a read operation involved accessing 3 SSTables then you will find a +ve 
value against Offset 3. The values are recent i.e. for duration lapsed between 
two calls.
+  * Write Latency: This shows the distribution of number of operations across 
the range of Offset values representing latency in microseconds. For eg. If 100 
operations took say 5 ms then you will find a +ve value against offset 5.
+  * Read Latency: This is similar to write latency. The values are recent i.e. 
for duration lapsed between two calls.
+  * Row Size: This shows the distribution of rows across the range of Offset 
values representing size in bytes. For eg. If you have 100 rows of size 
2000bytes then you will find a +ve value against offset 2000.
+  * Column Count: This is similar to row size. The offset values represent 
column count.
+ 
+ === Some additional details ===
+ 
+ Typically in a histogram the values are plotted over discrete intervals. 
Similarly Cassandra defines buckets. The number of buckets is 1 more than the 
bucket offsets. The last element is values greater than the last offset. The 
values you see in the Offset column in the output is bucket offsets.
+ The bucket offset starts at 1 and grows by 1.2 each time (rounding and 
removing duplicates). It goes from 1 to around 36M by default (creating 90+1 
buckets), which will give us timing resolution from microseconds to 36 seconds, 
with less precision as the numbers get larger. (see EstimatedHistogram class)
+

[Cassandra Wiki] Update of "NodeTool" by IanDanforth

Reply via email to