[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886993#action_12886993
 ] 

Joydeep Sen Sarma commented on HDFS-1094:
-----------------------------------------

:-) - of course i meant a node group = fixed set of nodes.

and of course - replication is done within any three within a node group (and 
the node group is much larger than 3 of course). i understand there's a subtext 
to this - that we do want the replicas to span racks and that the writer writes 
one locally and one within the same rack. that's all good and consistent 
though. as long as the node group spans multiple racks and there are two from 
the writer's rack - we are good. i don't see how this is all that different 
from what the protocol u listed - where u are choosing nodes randomly subject 
to some constraints.

the main difference is that i am suggesting minimizing the number of node 
groups. ur protocol implicitly has a finite number of node groups - but i am 
trying to argue that it's much much more than it needs to be. the math is 
pretty obvious (though tedious). the reduction in loss probabilities comes from 
the small node group. but it is offset by the larger number of node groups. if 
one doesn't minimize the number of node groups - we start losing out on the 
benefits of this scheme.

good point about racks with different number of nodes. let me think about it.

> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to