[
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647849#comment-16647849
]
Archana Katiyar edited comment on HBASE-21301 at 10/12/18 12:20 PM:
--------------------------------------------------------------------
*Summary of the work done till now*:
* Store data in a HBase table (new system table)
** We will store stats for all the regions corresponding to a given table in
this table.
** TODO: Decide upon the schema, [[email protected]] suggested to take
a reference from [openTSB
schema|http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html].
* Add a ScheduledChore in HRegionServer class; this Chore will wake up in
every x minutes (configurable) and store the read\write count for last x
minutes in table. In future, same chore can be utilized to store other stats
also. There were two options to record read and write stats for last x minutes
in HRegion class:
** Introduce new read and write counters which should increment based on the
operation performed by the user. ScheduledChore should reset the new counters
once it has recorded the current values.
** Use existing read and write counters of HRegion. ScheduledChore should take
care of finding the stats for last x minutes (because existing counters keep
track of the stats starting when region went live).
I am implementing using existing counters to make sure that performance impact
per read\write operation is minimal because of this change.
* Add a new jsp file which reads data from the table and displays in the form
of heatmap. The logic of this jsp file is simple, use table name and epoch time
to query stats for all the regions which were live at that time.
Also, [~apurtell] suggested to eventually store information per store file (may
be not in v1 of this feature, but its a good goal to have). In his own words -
_"Regarding what granularity to use for statistics collection, you are
definitely on the right track to start with the region as the smallest unit to
consider. I believe Google's design of Key Visualizer can drill down to
narrower, sub-region, scopes, so I have been thinking about how to achieve
that, if we want. I would not recommend doing it for the first cut because we
already have support for region level metrics that you can build on. However,
imagine during compaction we collect statistics over all K-Vs in every HFile,
then write the statistics into the hfile file trailer, then retrieve those
statistics later using a new API. This will let us do things like alter
compaction strategy decisions with greater awareness of the characteristics of
the data in the store (see W-5473921 Enhance compaction upgrade decision to
consider file statistics) or potentially generate heatmaps of key access rates
at a store file granularity. Each store file will give you a key-range and a
read and write access count that you can aggregate. The start and end keys of
those ranges will be different from region start and end keys because store
files only have a subset of all keys in the region. This lets us find hot
regions that are narrower in scope than the region, which will be more precise
information on how to, potentially, split the keyspace to better distribute the
load, or to narrow down what aspect of application data model or implementation
is responsible for the hotspot. I don't know _how_ to track key access stats
with sub region granularity, though. We would need this information on hand to
write into the hfile during compaction. Maybe we could sample reads and writes
at the HRegion level and keep the derived stats in an in-memory data structure
in the region. (Much lower overhead to keep it in-memory and local than attempt
to persist to a real table.) We would persist relevant stats from this
datastructure into the store files written during flushes and compactions."_
was (Author: archana.katiyar):
*Summary of the work done till now*:
* Store data in a HBase table (new system table)
* We will store stats for all the regions corresponding to a given table in
this table.
* TODO: Decide upon the schema, [[email protected]] suggested to take
a reference from [openTSB
schema|http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html].
* Add a ScheduledChore in HRegionServer class; this Chore will wake up in
every x minutes (configurable) and store the read\write count for last x
minutes in table. In future, same chore can be utilized to store other stats
also. There were two options to record read and write stats for last x minutes
in HRegion class:
* Introduce new read and write counters which should increment based on the
operation performed by the user. ScheduledChore should reset the new counters
once it has recorded the current values.
* Use existing read and write counters of HRegion. ScheduledChore should take
care of finding the stats for last x minutes (because existing counters keep
track of the stats starting when region went live).
I am implementing using existing counters to make sure that performance impact
per read\write operation is minimal because of this change.
* Add a new jsp file which reads data from the table and displays in the form
of heatmap. The logic of this jsp file is simple, use table name and epoch time
to query stats for all the regions which were live at that time.
Also, [~apurtell] suggested to eventually store information per store file (may
be not in v1 of this feature, but its a good goal to have). In his own words -
_"Regarding what granularity to use for statistics collection, you are
definitely on the right track to start with the region as the smallest unit to
consider. I believe Google's design of Key Visualizer can drill down to
narrower, sub-region, scopes, so I have been thinking about how to achieve
that, if we want. I would not recommend doing it for the first cut because we
already have support for region level metrics that you can build on. However,
imagine during compaction we collect statistics over all K-Vs in every HFile,
then write the statistics into the hfile file trailer, then retrieve those
statistics later using a new API. This will let us do things like alter
compaction strategy decisions with greater awareness of the characteristics of
the data in the store (see W-5473921 Enhance compaction upgrade decision to
consider file statistics) or potentially generate heatmaps of key access rates
at a store file granularity. Each store file will give you a key-range and a
read and write access count that you can aggregate. The start and end keys of
those ranges will be different from region start and end keys because store
files only have a subset of all keys in the region. This lets us find hot
regions that are narrower in scope than the region, which will be more precise
information on how to, potentially, split the keyspace to better distribute the
load, or to narrow down what aspect of application data model or implementation
is responsible for the hotspot. I don't know _how_ to track key access stats
with sub region granularity, though. We would need this information on hand to
write into the hfile during compaction. Maybe we could sample reads and writes
at the HRegion level and keep the derived stats in an in-memory data structure
in the region. (Much lower overhead to keep it in-memory and local than attempt
to persist to a real table.) We would persist relevant stats from this
datastructure into the store files written during flushes and compactions."_
> Heatmap for key access patterns
> -------------------------------
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
> Issue Type: Improvement
> Reporter: Archana Katiyar
> Assignee: Archana Katiyar
> Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a
> heat map of the keyspace. *Given how hotspotting comes up now and again here,
> this is a good idea for giving HBase ops a tool to be proactive about it.*
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer
> helps debug performance issues due to unbalanced access patterns across the
> key space, or single rows that are too large or receiving too much read or
> write activity. With Key Visualizer, you get a heat map visualization of
> access patterns over time, along with the ability to zoom into specific key
> or time ranges, or select a specific row to find the full row key ID that's
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud
> Bigtable clusters with sufficient data or activity, and does not affect Cloud
> Bigtable cluster performance.
> <<<
> From
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)