[
https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392726#comment-14392726
]
Michael Segel commented on HBASE-12853:
----------------------------------------
Sorry, I thought that the HM had the META table cached in memory. Didn't think
that the META was too large....
Ok, so then it looks like what I want to do is all client side then.
The design is pretty straight forward.
The number of buckets is fixed at the time of table creation.
The row key is a composite key of bucket_id | rowkey and the bucket_id is
derived from taking the modulus N of the first byte of the row key. (Giving you
0xFF(255) max buckets. ) Then when you want to fetch a single row given the
rowkey, you can find the bucket and fetch the single row. If you need to do a
scan, given the start row, you can then create N parallel threads and within
each thread, start the scan by prepending the bucket_id | to the start rowkey.
When returning the result set, you can then strip off the bucket_id | and take
the MIN(value(n)) value(n) is the next row from each scanner, popping it off
the stack. This will give you a single result that is guaranteed to still be
within sort order.
Its all client side and it abstracts the bucketing from the user/client code so
that the same code will run against either table without any changes.
> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
> Key: HBASE-12853
> URL: https://issues.apache.org/jira/browse/HBASE-12853
> Project: HBase
> Issue Type: New Feature
> Reporter: Michael Segel
> Priority: Minor
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is
> that while 'salting' alleviated regional hot spotting, it increased the
> complexity required to utilize the data.
> Through the use of coprocessors, it should be possible to offer a method
> which distributes the data on write across the cluster and then manages
> reading the data returning a sort ordered result set, abstracting the
> underlying process.
> On table creation, a flag is set to indicate that this is a parallel table.
> On insert in to the table, if the flag is set to true then a prefix is added
> to the key. e.g. <region server#>- or <region server #|| where the region
> server # is an integer between 1 and the number of region servers defined.
> On read (scan) for each region server defined, a separate scan is created
> adding the prefix. Since each scan will be in sort order, its possible to
> strip the prefix and return the lowest value key from each of the subsets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)