[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649249#comment-16649249
 ] 

Archana Katiyar commented on HBASE-21301:
-----------------------------------------

Thanks [~stack] and [~allan163] for your suggestions.

We need to store stats for at-least couple of weeks; this is because we have 
observed cases in past where we have come to know that there was hotspotting, 
only after 1-2 weeks (this includes the time taken by the affected team to 
report the case, then SR team takes their own time, then our prod support takes 
some time to figure out the right direction to debug etc etc.). If there is 
continuous hot-spotting then things are easy to handle; but typically it 
happens when usage pattern changes suddenly (for example, huge data copy 
because of data migration). And until we realize that something went wrong, 
usage pattern comes back to normal. Also, just to add to the grief - typically 
affected team is not the one who caused the problem at the first hand.

This is why we were thinking of having 30 days TTL for the stats data.

With this much persistence requirement, keeping everything in memory doesn't 
look like a feasible option, specially when we start collecting stats at store 
file (or further at data block level).

If system table is not an feasible option, then we can think of any of the 
following options -
 * Store data in a usertable (I am hoping, user tables have less overhead than 
system tables.); we can make sure to create this table from hbase start shell 
script. We need to have some fallback implementation though in case this table 
is not available.
 * Use a underlying file in HDFS (of local disc); implement some sort of append 
only write path. We also need to implement housekeeping to delete old data. 
Basically, we need a circular linked list type of storage which provides 
persistence as well.
 * Use memory mapped files (mentioned in 
[https://github.com/DataSketches/memory] ) and use it with underlying storage.

Also, to answer [~allan163]'s concern of reducing read perf - since I am 
planning to record the stats periodically (like every 15 mins) and there is no 
additional call or statement per read operation (I am planning to to use 
existing counters only to collect region level stats), there should not be any 
observable perf difference. In future, when we extend it to store file level or 
block level, then we have to be vigilant about doing it at the right place and 
time.

Comments\Suggestions\Thoughts are welcome.

> Heatmap for key access patterns
> -------------------------------
>
>                 Key: HBASE-21301
>                 URL: https://issues.apache.org/jira/browse/HBASE-21301
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Archana Katiyar
>            Assignee: Archana Katiyar
>            Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to