[ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594273#comment-13594273
 ] 

Devaraj Das commented on HBASE-4755:
------------------------------------

{quote}
The difficult point is to choose the third RS now: we've got one missing. Some 
comments:
-> We now have 2 RS on the same rack. So the config will be primary & secondary 
on the same rask and tertiary on another (not ideal).
-> We can imagine a situation where the first RS will come back to life soon 
(rolling restart for example).
{quote}

Hmm.. We should designate an RS in a different rack (all new store files would 
go to that node, and all existing data would eventually get to that node via 
compactions). For the rolling restart case, it should be fine since the meta 
assignments wouldn't change and when the primary comes back to life, the 
regions (probably currently assigned to the secondary) would be reassigned. But 
yeah, I see that the loadbalancer would probably have to be aware of the 
rolling restart situation so that it doesn't prematurely assume certain RSs are 
really "down" and take (wasteful) corrective actions.

bq. We may have a first step in which we just go to the same servers for WAL & 
newly created HFiles. 

Hmm.. good point. Will tackle WALs as a subtask of this jira.

The patch in HBASE-7932 does a major part of the work for getting the location 
information to the meta table, and then send it down to the RS. I need to use 
the API (HDFS-2576) in HBASE-7942 to really create the files on specific nodes. 
The balancer work would be separate subtask.
                
> HBase based block placement in DFS
> ----------------------------------
>
>                 Key: HBASE-4755
>                 URL: https://issues.apache.org/jira/browse/HBASE-4755
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.0
>            Reporter: Karthik Ranganathan
>            Assignee: Christopher Gist
>            Priority: Critical
>         Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to