[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886903#action_12886903
 ] 

Joydeep Sen Sarma commented on HDFS-1094:
-----------------------------------------

my mental model is that a node group is a fixed (at any given time) of nodes. 
together the node groups cover the entire cluster. it would be nice to have the 
node-groups to be exclusive - but it's not necessary. the reduction in data 
loss probability would largely come from the small size of each node group 
(combinatorically - the number of ways a small set of failures can all happen 
within one node group is much smaller compared to the total number of ways 
those same number of failures can happen).

the assumption is that the writer would choose one node group (for a block it's 
writing). if node groups are exclusive and we want to have one copy locally - 
this means that the node group is fixed per node (all writers on that node 
would choose the node-group to which the node belongs). that is why it seems 
that an entire file would always belong the same node-group (as Hairong has 
argued). 

node groups cannot be too small - because (as i mentioned) - we would limit 
re-replication bandwidth (and therefore start increasing time to recovery for 
single bad disk/node). we really need a plot of time to recovery vs. node-group 
size to come up with a safe size for the node group. having exclusive groups is 
nice because it minimizes the number of node groups for a given node group size.

> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to