[
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980504#comment-13980504
]
Daryn Sharp commented on HDFS-6133:
-----------------------------------
A main concern we have is the performance/scalability of the proposed patch.
Looking up paths for every block is not trivial. The namespace lock is going
to be held for a much longer duration leading to performance degradation. GC
pressure will also increase from all the temporary arrays, byte to string
conversions, and stringbuilder instances need to resolve the path.
Another concern is this design approach. The BM shouldn't be path aware. We
are trying to scale the performance and size of the namespace with a long term
goal of the BM being an independent service. As such, the BM cannot have a
direct reference to the namesystem, which means it cannot be path aware. Not
to mention it violates the path -> inode -> block layer abstractions if a block
is path aware.
--
I understand and share your desire is to prevent balancing from destroying
hbase performance. I believe the goal is to keep the blocks of an hbase column
on the region server? Pinning based on path pins all replicas, so when you
increase your cluster capacity, it's effectively up to hbase to re-balance the
cluster by recreating files. If your cluster is hbase-centric, then making the
balancer exclude all hbase files has marginal value.
I think what we really need is to support pinning the local replica but allow
the other remote replicas to float.
Perhaps a new flag to create/append could be used to instruct the data streamer
to pin the local replica. The flag is part of the block's metadata (whether in
the file or elsewhere). The DN may then refuse balancer requests to replace
the block. If done correctly, it's completely backwards compatible and
requires no NN changes.
> Make Balancer support exclude specified path
> --------------------------------------------
>
> Key: HDFS-6133
> URL: https://issues.apache.org/jira/browse/HDFS-6133
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer, namenode
> Reporter: zhaoyunjiong
> Assignee: zhaoyunjiong
> Attachments: HDFS-6133.patch
>
>
> Currently, run Balancer will destroying Regionserver's data locality.
> If getBlocks could exclude blocks belongs to files which have specific path
> prefix, like "/hbase", then we can run Balancer without destroying
> Regionserver's data locality.
--
This message was sent by Atlassian JIRA
(v6.2#6252)