[
https://issues.apache.org/jira/browse/HBASE-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392667#comment-14392667
]
Kevin Odell commented on HBASE-13323:
-------------------------------------
Andrew,
I do agree that HDFS-7967 will add more complexity to the HDFS balancer, but
as I recently added to the mailing list, it is still a manual process. The
HDFS balancer is rarely ran, usually when nodes are added or removed. I am
just throwing this out, but a reasonably simple implementation would be to add
a way to limit the number of regions per RS. This way you could allow for more
regions to be served off of larger nodes. This would help to limit the
capacity used by the small storage nodes. The main difficulty with this
approach would be handling failure scenarios, I am thinking of losing a rack.
This could create an influx of regions to nodes causing them to run over their
limit. The setting could be a "best effort" similar to slop was for the table
balancer.
> Audit behavior heterogenous node capacity
> -----------------------------------------
>
> Key: HBASE-13323
> URL: https://issues.apache.org/jira/browse/HBASE-13323
> Project: HBase
> Issue Type: Task
> Components: Balancer
> Reporter: Nick Dimiduk
> Labels: beginner
>
> From the thread "introducing nodes w/ more storage"
> (http://search-hadoop.com/m/DHED4azyle2), we should have a look at what
> happens when nodes of varying data density are used in a single cluster. The
> user would expect that nodes be filled according to their capacity, meaning
> an "even distribution" looks like all nodes at the same pct use. This
> behavior is probably in the intersection of hbase balancer and hdfs balancer.
> Probably this is made more complex by recent HDFS features such as HDFS-5682.
> After investigation, let's fix it up to work better (if it's broken), and
> document the behavior in our awesome book.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)