[ 
https://issues.apache.org/jira/browse/HADOOP-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707337#action_12707337
 ] 

Tom White commented on HADOOP-3799:
-----------------------------------

Dhruba, These look like good changes - glad to see this moving forward. More 
comments below:

* Can BlockPlacementInterface be an abstract class? I would also change its 
name to not have the "Interface" suffix, something like ReplicationPolicy, or 
BlockPlacementPolicy. ReplicationTargetChooser could be renamed something like 
DoubleRackReplicationPolicy or DoubleRackBlockPlacementPolicy or similar, to 
better describe its role.
* Why doesn't ReplicationPolicy simply pass through verifyBlockPlacement()? It 
seems odd that it's doing extra work here.
* BlockPlacementInterface#chooseTarget(). Make excludedNodes a 
List<DatanodeDescriptor>. Implementations may choose to turn it into a map if 
they need to, but for the interface, it should just be a list, shouldn't it?
* For future evolution, can we pass a Configuration to the initialize() method, 
rather than the considerLoad boolean?
* Rather than passing the full FSNamesystem to the initialize method, it would 
be preferable to create an interface for the part that the block placement 
strategy needs. Something like FSNamespaceStats, which only needs 
getTotalLoad() for the moment. I think this is an acceptable use of an 
interface, since it only used by developers writing a new block placement 
strategy. There's a similar situtation for job scheduling in MapReduce: 
JobTracker implements the package-private TaskTrackerManager interface so that 
TaskScheduler doesn't have to pull in the whole JobTracker. This helps a lot 
with testing.
* These changes should make it possible to unit test ReplicationTargetChooser 
directly. This could be another Jira.

> Design a pluggable interface to place replicas of blocks in HDFS
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3799
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3799
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: BlockPlacementPluggable.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to