[
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649249#comment-16649249
]
Archana Katiyar commented on HBASE-21301:
-----------------------------------------
Thanks [~stack] and [~allan163] for your suggestions.
We need to store stats for at-least couple of weeks; this is because we have
observed cases in past where we have come to know that there was hotspotting,
only after 1-2 weeks (this includes the time taken by the affected team to
report the case, then SR team takes their own time, then our prod support takes
some time to figure out the right direction to debug etc etc.). If there is
continuous hot-spotting then things are easy to handle; but typically it
happens when usage pattern changes suddenly (for example, huge data copy
because of data migration). And until we realize that something went wrong,
usage pattern comes back to normal. Also, just to add to the grief - typically
affected team is not the one who caused the problem at the first hand.
This is why we were thinking of having 30 days TTL for the stats data.
With this much persistence requirement, keeping everything in memory doesn't
look like a feasible option, specially when we start collecting stats at store
file (or further at data block level).
If system table is not an feasible option, then we can think of any of the
following options -
* Store data in a usertable (I am hoping, user tables have less overhead than
system tables.); we can make sure to create this table from hbase start shell
script. We need to have some fallback implementation though in case this table
is not available.
* Use a underlying file in HDFS (of local disc); implement some sort of append
only write path. We also need to implement housekeeping to delete old data.
Basically, we need a circular linked list type of storage which provides
persistence as well.
* Use memory mapped files (mentioned in
[https://github.com/DataSketches/memory] ) and use it with underlying storage.
Also, to answer [~allan163]'s concern of reducing read perf - since I am
planning to record the stats periodically (like every 15 mins) and there is no
additional call or statement per read operation (I am planning to to use
existing counters only to collect region level stats), there should not be any
observable perf difference. In future, when we extend it to store file level or
block level, then we have to be vigilant about doing it at the right place and
time.
Comments\Suggestions\Thoughts are welcome.
> Heatmap for key access patterns
> -------------------------------
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
> Issue Type: Improvement
> Reporter: Archana Katiyar
> Assignee: Archana Katiyar
> Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a
> heat map of the keyspace. *Given how hotspotting comes up now and again here,
> this is a good idea for giving HBase ops a tool to be proactive about it.*
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer
> helps debug performance issues due to unbalanced access patterns across the
> key space, or single rows that are too large or receiving too much read or
> write activity. With Key Visualizer, you get a heat map visualization of
> access patterns over time, along with the ability to zoom into specific key
> or time ranges, or select a specific row to find the full row key ID that's
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud
> Bigtable clusters with sufficient data or activity, and does not affect Cloud
> Bigtable cluster performance.
> <<<
> From
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)