[ 
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604029#comment-13604029
 ] 

Gary Helmling commented on HBASE-4755:
--------------------------------------

bq. replaying the log. Historically this has been the biggest problem, but 
should be getting better. And with Jeffrey's HBASE-7835, I would expect this 
becomes even better. We are not doing local reads to the log files here, and 
Sanjay's proposal does not help in this case. But the data is supposed to be 
one-two orders of magnitude smaller than in hfiles.

This could also be handled by HDFS-2576 if we allowed a WAL per region, an idea 
with it's own pros and cons that we probably shouldn't get into here.  So this 
is not intractable.

bq. filling up the block cache: I think this is the biggest cost we are paying 
for online serve. HDFS improvements wont help us here.

This assumes a cachable working set, which is not always the case.  Of course 
the block cache will help mask the cost of remote reads when you have good 
cache affinity.

I also agree that Sanjay's API is worth pursuing.  I think it was something 
that was actually proposed way back with the idea of in-process lucene indexing 
with HBASE-3529, but was unfortunately shot down.  I also agree that this would 
be an improvement over the current situation, but the time to pull blocks local 
can still be a painful cost, especially in the case of larger multi-node 
failures.

I do agree that HDFS-2576 comes at a cost of complexity, especially in terms of 
interactions with the HDFS balancer and normal decommissioning or 
under-replication handling.  But I think closing that final gap for immediate 
locality will make HBase more attractive for certain scenarios.
                
> HBase based block placement in DFS
> ----------------------------------
>
>                 Key: HBASE-4755
>                 URL: https://issues.apache.org/jira/browse/HBASE-4755
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.0
>            Reporter: Karthik Ranganathan
>            Assignee: Christopher Gist
>            Priority: Critical
>         Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data 
> locality on regionservers, but this feature can also enable a lot of nice 
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to 
> replicate data (r=3) by place blocks on various regions, it is better to let 
> HBase do so by providing hints to HDFS through the DFS client. That way 
> instead of replicating data at a blocks level, we can replicate data at a 
> per-region level (each region owned by a promary, a secondary and a tertiary 
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the 
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred 
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to 
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality 
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the 
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table 
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some 
> table to a subset of regionservers, so an abusive app cannot take down the 
> whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to