[ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591101#comment-13591101
 ] 

Devaraj Das commented on HBASE-4755:
------------------------------------

Thanks [~jmhsieh] for the review. Responses to your comments/questions below:

bq. An assignmentDomain is not what I've been calling an affinity group over in 
HDFS-2576 – it is the set of possible DN's to assign to, yeah? Are assignment 
domains persisted or just queried when creating tables (how does this info come 
from hdfs)?

Yes, AssignmentDomain is a set of nodes you carve out based on some policies. 
Currently it is the whole cluster. I think work needs to be done to make it a 
useful abstraction for multi-tenancy or load-balancing, etc. and no they are 
not persisted as of now. 

bq. Table creation was recently changed with the inclusion of HBASE-7365. 
Probably need figure out how to thread the AssignmentDomain stuff in, has 
interesting follow-on work for snapshot clones. (snapshots should likely ignore 
on the first cut)

I don't think this matters in the short term. We are mostly talking about the 
real data in the context of AssignmentDomain as opposed to metadata (about file 
paths and so on).

bq. Step 3 – this is done in assignment manager?

Yes. An assign() method that takes a Map<HRegionInfo, List<ServerName>> has 
been introduced. The method internally assigns the region to the 0th element 
(the primary RS) in the list for every region in the map. This flow works well 
for (pre-split) tables created newly. I need to see how the flow of 
"enableTable -> (after sometime)disableTable -> (after sometime)enableTable" 
works in this context.

bq. recovery done by assignment manager?
Yes, AM and SSH I'd think. I haven't thought much about this part yet.

bq. What does the maintenance tool interact with and need to see – the new meta 
table cols, the master, the assignment domain? is this maintenance tool the 
only place other than the creation time that changes the preferred dn's? Should 
there be commands to manually override the dn choices? What would the be 
roughly for surgery purposes?

The tool would interact with the metatable cols and the AssignmentDomain 
(although in the first implementation, the assignmentdomain is the whole 
cluster). Yes this is the only place other than the table creation that would 
change the preferred DNs. Agree with manual overrides (that's what you meant by 
surgery?).

bq. For natural splits – for the first implementation – what is the story? (no 
preferred dn's specified? copies parent's preferred DN's?) If there are no dn's 
specified or if the specified dn's are invalid we "fall back" to the old 
randome policies, yeah?

I think we can do better than random policies, and copying parent's preferred 
DNs seems fine as a start as well. I'll address this at some point.

                
> HBase based block placement in DFS
> ----------------------------------
>
>                 Key: HBASE-4755
>                 URL: https://issues.apache.org/jira/browse/HBASE-4755
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.0
>            Reporter: Karthik Ranganathan
>            Assignee: Christopher Gist
>            Priority: Critical
>         Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to