[
https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291765#comment-14291765
]
Michael Segel commented on HBASE-12853:
----------------------------------------
Before we go in to a design, I need to get a bit more information.
As a practice, I don't review HBase source code and work from the exposed APIs.
Of course looking at the HBase API these days is a bit of a CF since most of
the APIs are deprecated referring to other deprecated classes / interfaces etc
... not to mention there a couple of different releases...
So we start with a Connection instance which we get a instance of class Table
for the given table.
Ignoring put() for a moment, we have get() and getScanner() methods.
What happens on the server side of the connection when the client calls
getScanner() or get() ?
Part of the issue is that a simple scanner won't work right unless you end up
preprocessing it and treating it as a scanner but with a default (blank) set of
filters.
So while I can walk you through the logic and give you a resulting diagram, I
need a committer who's familiar with the server side workings. Then it should
be a pretty straight forward thing to implement.
-Mike
> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
> Key: HBASE-12853
> URL: https://issues.apache.org/jira/browse/HBASE-12853
> Project: HBase
> Issue Type: New Feature
> Reporter: Michael Segel
> Priority: Minor
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is
> that while 'salting' alleviated regional hot spotting, it increased the
> complexity required to utilize the data.
> Through the use of coprocessors, it should be possible to offer a method
> which distributes the data on write across the cluster and then manages
> reading the data returning a sort ordered result set, abstracting the
> underlying process.
> On table creation, a flag is set to indicate that this is a parallel table.
> On insert in to the table, if the flag is set to true then a prefix is added
> to the key. e.g. <region server#>- or <region server #|| where the region
> server # is an integer between 1 and the number of region servers defined.
> On read (scan) for each region server defined, a separate scan is created
> adding the prefix. Since each scan will be in sort order, its possible to
> strip the prefix and return the lowest value key from each of the subsets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)