[
https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707337#action_12707337
]
Tom White commented on HADOOP-3799:
-----------------------------------
Dhruba, These look like good changes - glad to see this moving forward. More
comments below:
* Can BlockPlacementInterface be an abstract class? I would also change its
name to not have the "Interface" suffix, something like ReplicationPolicy, or
BlockPlacementPolicy. ReplicationTargetChooser could be renamed something like
DoubleRackReplicationPolicy or DoubleRackBlockPlacementPolicy or similar, to
better describe its role.
* Why doesn't ReplicationPolicy simply pass through verifyBlockPlacement()? It
seems odd that it's doing extra work here.
* BlockPlacementInterface#chooseTarget(). Make excludedNodes a
List<DatanodeDescriptor>. Implementations may choose to turn it into a map if
they need to, but for the interface, it should just be a list, shouldn't it?
* For future evolution, can we pass a Configuration to the initialize() method,
rather than the considerLoad boolean?
* Rather than passing the full FSNamesystem to the initialize method, it would
be preferable to create an interface for the part that the block placement
strategy needs. Something like FSNamespaceStats, which only needs
getTotalLoad() for the moment. I think this is an acceptable use of an
interface, since it only used by developers writing a new block placement
strategy. There's a similar situtation for job scheduling in MapReduce:
JobTracker implements the package-private TaskTrackerManager interface so that
TaskScheduler doesn't have to pull in the whole JobTracker. This helps a lot
with testing.
* These changes should make it possible to unit test ReplicationTargetChooser
directly. This could be another Jira.
> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>
> Key: HADOOP-3799
> URL: https://issues.apache.org/jira/browse/HADOOP-3799
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: BlockPlacementPluggable.txt
>
>
> The current HDFS code typically places one replica on local rack, the second
> replica on remote random rack and the third replica on a random node of that
> remote rack. This algorithm is baked in the NameNode's code. It would be nice
> to make the block placement algorithm a pluggable interface. This will allow
> experimentation of different placement algorithms based on workloads,
> availability guarantees and failure models.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.