[
https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708766#action_12708766
]
dhruba borthakur commented on HADOOP-3799:
------------------------------------------
> Especially since in order to be stable under the rebalancer
Oh guys, you are going too far! I am talking of faster cycle of innovation and
iteration. A pluggable interface allows the hadoop community to try experiments
with newer methods of block placement. Once such a placement algorithm proves
beneficial and helpful, does the related questions of "how to make the balancer
work with the new placement policy" come into my mind. If experiments prove
that there isn't any viable alternative pluggable policy, then the question of
"does the balancer work with a pluggable policy" is moot.
> hdfs probably needs to store metadata with the files or blocks
I do not like this approach. It makes hdfs heavy, clunky and difficult to
maintain. Have you seen what happened to other file system that tried to do
everything inside it, e.g. DCE-DFS? It is possible that HDFS might allow
generic blobs to be stored stored with files (aka extended file attributes)
where application specific data can be stored. But it should be disassociated
from a "requirement" that archival-policy has to be stored with file meta-data.
Again folks, I agree completely with you that a "finished product" needs to
encompass the "balancer". But to start experimenting to figure out whether a
different placement policy is beneificial at all, I need the pluggability
feature, otherwise I have to keep changing my hadoop source code every time I
want to experiment. My experiments will probably take 3 to six months,
especially because I want to benchmark results at large scale.
For installations that go with the default policy, there is no impact at all.
> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>
> Key: HADOOP-3799
> URL: https://issues.apache.org/jira/browse/HADOOP-3799
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: BlockPlacementPluggable.txt
>
>
> The current HDFS code typically places one replica on local rack, the second
> replica on remote random rack and the third replica on a random node of that
> remote rack. This algorithm is baked in the NameNode's code. It would be nice
> to make the block placement algorithm a pluggable interface. This will allow
> experimentation of different placement algorithms based on workloads,
> availability guarantees and failure models.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.