[
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207
]
Nick Dimiduk commented on HBASE-11682:
--------------------------------------
bq. HBase also attempts to store rows near each other in the same region, on
the same region server.
This sentence doesn't help much. A region is a contiguous sequence of rows that
are physically hosted as a unit. Rows on region boundaries are
lexicographically near each other but are part of different regions, so there
are no guarantees about them being hosted on the same region server.
bq. However, poorly designed row keys can lead to
<firstterm>hotspotting</firstterm>.
This is where schema/rowkey design and access patterns go hand-in-hand.
bq. Hotspotting occurs when nearly all the rows being written to HBase are
written to the same region, because their row keys are contiguous or very
similar.
I'd say "Hotspotting occurs when too much client traffic is directed at a
single region. This can be from reads, writes, or both. The traffic overwhelms
the single machine responsible for hosting that region, causing performance
degradation and potentially leading to region unavailability. This can also
have adverse effects on other regions hosted by the same region server as that
host is unable to service the requested load."
bq. but in the bigger picture, data is being written to multiple regions across
the cluster ...
Again, not limited to writes.
bq. One technique is to salt the row keys
Is the term "salt" explained?
bq. However, using totally random row keys would remove any benefit of HBase's
row-sorting algorithm and cause very poor performance, as each get or scan
would need to query all regions.
You're assuming a sequential access pattern here. Random rowkeys can be okay
for random read access patterns, in that load is spread all over the cluster.
I've seen other issues around poor blockcache performance from completely
random access patterns, but that's a slight tangent.
> Explain hotspotting
> -------------------
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
> Issue Type: Task
> Components: documentation
> Reporter: Misty Stanley-Jones
> Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)