[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886861#action_12886861
 ] 

Joydeep Sen Sarma commented on HDFS-1094:
-----------------------------------------

one can't have a different node-group for each block/file. that would defeat 
the whole point. (in fact - every block today is in a 3-node node-group - and 
there are gazillions of such node groups that overlap).

the reduction in data loss probability comes out of the fact that the odds of 3 
nodes falling into the same node-group is small. (if they don't fall into the 
same node-group - there's no data loss).

if the number of node groups is very large (because of overlaps) - then the 
probability of 3 failing nodes falling into the same node group will start 
going up (just because there are more node-groups to choose from).  the more 
the node-groups are exclusive - the better. that means the number of 
node-groups is minimized wrt. a constant number of nodes. as i mentioned - the 
size of the node-group is dictated to some extent by re-replication bandwidth. 
one wants very small node groups - but that doesn't work because there's not 
enough re-replication bandwidth (a familiar problem in RAID).

if u take some standard cluster (say 8 racks x 40 nodes) - how many distinct 
node groups would ur algorithm end up with?


> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to