[
https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603947#comment-13603947
]
Devaraj Das commented on HBASE-4755:
------------------------------------
I was discussing with [~sanjay.radia] on this topic since there is HDFS
dependency.. He came up with another idea on the HDFS side (and he plans to
implement it soon). Seemed good to me. When a RS failure happens, a random RS
picks some region up (as usual). Now when the region is being served by that
RS, HDFS also transparently replicates the associated blocks onto that.
Eventually, the remote blocks become local. To start with, one simple API could
be exposed by the HDFS - makeBlocksLocal(Path). When some client tries to
access the region, the RS serves the region but also makes this API call for
the HFile paths that the region is comprised of. The copying of the blocks
happen in the background.
The pros is that it is simple to think about, and doesn't intrude much into
HBase. In the current approach (via HDFS-2576), on a failure, the new RS would
have all the blocks local. But it requires HBase to periodically go and make
sure the locality maps (in the meta table) are optimal (since nodes go down,
etc.), and maybe reassign regions based on degree of locality w.r.t. the
datanodes, etc.
In this approach, we won't wait for a compaction to happen to rewrite the hfile
data locally in the new RS.
The cons is that when a failure happens, there might be significant network
communication when the blocks of the accessed regions are getting localized
into the new RSs for the regions (but maybe many of them are rack local to the
new RSs). This will not be worse than what would happen today though.
What do people think about the above?
> HBase based block placement in DFS
> ----------------------------------
>
> Key: HBASE-4755
> URL: https://issues.apache.org/jira/browse/HBASE-4755
> Project: HBase
> Issue Type: New Feature
> Affects Versions: 0.94.0
> Reporter: Karthik Ranganathan
> Assignee: Christopher Gist
> Priority: Critical
> Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data
> locality on regionservers, but this feature can also enable a lot of nice
> features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to
> replicate data (r=3) by place blocks on various regions, it is better to let
> HBase do so by providing hints to HDFS through the DFS client. That way
> instead of replicating data at a blocks level, we can replicate data at a
> per-region level (each region owned by a promary, a secondary and a tertiary
> regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the
> probability of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred
> regionservers for that region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3
> regionservers (random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to
> primary over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality
> checked and make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the
> following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table
> to get assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some
> table to a subset of regionservers, so an abusive app cannot take down the
> whole HBase cluster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira