symat commented on a change in pull request #369: HBASE-21606 document meta 
table load metrics
URL: https://github.com/apache/hbase/pull/369#discussion_r302410216
 
 

 ##########
 File path: src/main/asciidoc/_chapters/ops_mgt.adoc
 ##########
 @@ -1738,6 +1738,83 @@ hbase.regionserver.authenticationFailures::
 hbase.regionserver.mutationsWithoutWALCount ::
   Count of writes submitted with a flag indicating they should bypass the 
write ahead log
 
+[[rs_meta_metrics]]
+=== Meta Table Load Metrics
+
+HBase meta table metrics collection feature is available in HBase 1.4+ but it 
is disabled by default, as it can
+affect the performance of the cluster. When it is enabled, it helps to monitor 
client access patterns by collecting
+the following statistics:
+
+* number of get, put and delete operations on the `hbase:meta` table
+* number of get, put and delete operations made by the top-N clients
+* number of operations related to each table
+* number of operations related to the top-N regions
+
+When to use the feature::
+  This feature can help to identify hot spots in the meta table by showing the 
regions or tables where the meta info is
+  modified (e.g. by create, drop, split or move tables) or retrieved most 
frequently. It can also help to find misbehaving
+  client applications by showing which clients are using the meta table most 
heavily, which can for example suggest the
+  lack of meta table buffering or the lack of re-using open client connections 
in the client application.
+
+.Possible side-effects of enabling this feature
+[WARNING]
+====
+Having large number of clients and regions in the cluster can cause the 
registration and tracking of a large amount of
+metrics, which can increase the memory and CPU footprint of the HBase region 
server handling the `hbase:meta` table.
+It can also cause the significant increase of the JMX dump size, which can 
affect the monitoring or log aggregation
+system you use beside HBase. It is recommended to turn on this feature only 
during debugging.
+====
+
+Where to find the metrics::
+  Each metric attribute name will start with the ‘MetaTable_’ prefix. For all 
the metrics you will see five different
+  JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute 
rate. You will find these metrics in JMX
+  under the following MBean:
+  `Hadoop -> HBase -> RegionServer -> 
Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics`
+
+Configuration::
+  To turn on this feature, you have to enable a custom coprocessor by adding 
the following section to hbase-site.xml.
+  This coprocessor will run on all the HBase RegionServers, but will be active 
(i.e. consume memory / CPU) only on
+  the region, where the `hbase:meta` table is located. It will produce JMX 
metrics which can be downloaded from the
+  web UI of the given RegionServer or by a simple REST call.
+
+.Enabling the Meta Table Metrics feature
+[source,xml]
+----
+<property>
+    <name>hbase.coprocessor.region.classes</name>
+    <value>org.apache.hadoop.hbase.coprocessor.MetaTableMetrics</value>
+</property>
+----
+
+.How the top-N metrics are calculated?
+[NOTE]
+====
+The 'top-N' type of metrics will be counted using the lossy count algorithm, 
which is about to identify elements in a
+data stream whose frequency count exceed a user-given threshold. The frequency 
computed by this algorithm is not always
+accurate, but has an error threshold that can be specified by the user as a 
configuration parameter.
+The run time space required by the algorithm is inversely proportional to the 
specified error threshold, hence larger
+the error parameter, the smaller the footprint and the less accurate are the 
metrics. (see the following paper:
+link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). 
"Approximate frequency counts over data streams"])
+
+You can specify the error rate of the algorithm as a floating-point value 
between 0 and 1 (exclusive), it's default
+value is 0.02. Having the error rate set to `E` and having `N` as the total 
number of meta table operations, then
+(assuming the random distribution of the activity of low frequency elements) 
at most `7 / E` meters will be kept and
 
 Review comment:
   I copied that from the original paper, but I think you are right, 'uniform' 
is more specific than 'random' in case of a distribution. I will change this as 
well

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to