[
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980558#comment-13980558
]
stack commented on HDFS-6133:
-----------------------------
bq. I believe the goal is to keep the blocks of an hbase column on the region
server?
Yes. Locality is important: e.g. being able to SCR makes a big difference in
our serving latencies (and how much CPU we end up using servicing requests).
bq. ...it's effectively up to hbase to re-balance the cluster....
Facebook describe their HBase doing this. Add a rack and slow fill by placing
the 3rd replica over on the new rack. Only after the new rack is full enough,
switch on the regionservers in the new rack (The alternative is switch on
regionservers and cripple the top-of-the-rack switch with traffic as
regionservers read remotely). HDFS-2576 is an attempt at adding some of this
'Favored Node' facility back into apache hdfs (though we've not made use of it
over in apache hbaselandia up to this -- we've still to 'finish' up our end).
bq. If your cluster is hbase-centric, then making the balancer exclude all
hbase files has marginal value.
Well, currently the balancer may wreak havoc if it runs across the hbase
root.dir.
bq. I think what we really need is to support pinning the local replica but
allow the other remote replicas to float.
Another aspect of 'Favored Nodes' (implemented by FB, lagging in apache hbase)
was that on node crash, since we'd been 'placing' the blocks, we'd know which
machine(s) could serve the data that was on the dead machine while maintainng a
better than random locality quotient. We'd not be able to do this if the
non-locals could float.
Not suggesting that the above is the best way to go; just listing out a couple
of hbase concerns.
> Make Balancer support exclude specified path
> --------------------------------------------
>
> Key: HDFS-6133
> URL: https://issues.apache.org/jira/browse/HDFS-6133
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer, namenode
> Reporter: zhaoyunjiong
> Assignee: zhaoyunjiong
> Attachments: HDFS-6133.patch
>
>
> Currently, run Balancer will destroying Regionserver's data locality.
> If getBlocks could exclude blocks belongs to files which have specific path
> prefix, like "/hbase", then we can run Balancer without destroying
> Regionserver's data locality.
--
This message was sent by Atlassian JIRA
(v6.2#6252)