[ 
https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646617#comment-14646617
 ] 

Michael Segel  commented on HBASE-12853:
----------------------------------------

@Anoop,

Yes, that is correct. 
It was my misunderstanding on the client/server break. 
(I program to the APISs and don't look at the source code.) 

I believe I did mention this after your last post correcting my mistake.

Again, this is pretty simple... you're overloading the scan() so that it first 
does a check to see if the underlying table is bucketed or not.  A simple way 
to do this is to check the number of buckets. If its 0, then its not bucketed 
and you just run the scan like normal.  If it is a non-negative, non-zero 
integer, you would then parallelize the scan.

You would then need to wait until all of the result sets return before you can 
funnel the data in to a single result set to be returned to the user. 

Of course I'm assuming that each result set will start to send back results 
prior to completion of the ensuing scan. 
Note too that these will be range scans. 

One other side effect is that if the scan is a full table scan... things will 
get a bit messy. (We'll maybe not... )

> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
>                 Key: HBASE-12853
>                 URL: https://issues.apache.org/jira/browse/HBASE-12853
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Michael Segel 
>             Fix For: 2.0.0
>
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is 
> that while 'salting' alleviated  regional hot spotting, it increased the 
> complexity required to utilize the data.  
> Through the use of coprocessors, it should be possible to offer a method 
> which distributes the data on write across the cluster and then manages 
> reading the data returning a sort ordered result set, abstracting the 
> underlying process. 
> On table creation, a flag is set to indicate that this is a parallel table. 
> On insert in to the table, if the flag is set to true then a prefix is added 
> to the key.  e.g. <region server#>- or <region server #|| where the region 
> server # is an integer between 1 and the number of region servers defined.  
> On read (scan) for each region server defined, a separate scan is created 
> adding the prefix. Since each scan will be in sort order, its possible to 
> strip the prefix and return the lowest value key from each of the subsets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to