[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'

Michael Segel (JIRA) Fri, 16 Jan 2015 08:23:22 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280461#comment-14280461
 ]


Michael Segel  commented on HBASE-12853:
----------------------------------------

"An implemented one is OneBytePrefixKeySalter, where the prefix is 
hash(RowKey)%buckets" 

That's fine. But now if I have another client, I have to know that the table is 
bucketed. (Yes, I am refusing to use the term salt when talking about this... 
:-)

And not only do you need to know that the table is bucketed, you need to know 
the number of buckets.  You are also assuming that the individual is using a 
java application to query the data.  What happens if they are not? 
And that they've got the Intel library. 

If its done server side all of that goes away.

> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
>                 Key: HBASE-12853
>                 URL: https://issues.apache.org/jira/browse/HBASE-12853
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Michael Segel 
>            Priority: Minor
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is 
> that while 'salting' alleviated  regional hot spotting, it increased the 
> complexity required to utilize the data.  
> Through the use of coprocessors, it should be possible to offer a method 
> which distributes the data on write across the cluster and then manages 
> reading the data returning a sort ordered result set, abstracting the 
> underlying process. 
> On table creation, a flag is set to indicate that this is a parallel table. 
> On insert in to the table, if the flag is set to true then a prefix is added 
> to the key.  e.g. <region server#>- or <region server #|| where the region 
> server # is an integer between 1 and the number of region servers defined.  
> On read (scan) for each region server defined, a separate scan is created 
> adding the prefix. Since each scan will be in sort order, its possible to 
> strip the prefix and return the lowest value key from each of the subsets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'

Reply via email to