[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886903#action_12886903 ]
Joydeep Sen Sarma commented on HDFS-1094: ----------------------------------------- my mental model is that a node group is a fixed (at any given time) of nodes. together the node groups cover the entire cluster. it would be nice to have the node-groups to be exclusive - but it's not necessary. the reduction in data loss probability would largely come from the small size of each node group (combinatorically - the number of ways a small set of failures can all happen within one node group is much smaller compared to the total number of ways those same number of failures can happen). the assumption is that the writer would choose one node group (for a block it's writing). if node groups are exclusive and we want to have one copy locally - this means that the node group is fixed per node (all writers on that node would choose the node-group to which the node belongs). that is why it seems that an entire file would always belong the same node-group (as Hairong has argued). node groups cannot be too small - because (as i mentioned) - we would limit re-replication bandwidth (and therefore start increasing time to recovery for single bad disk/node). we really need a plot of time to recovery vs. node-group size to come up with a safe size for the node group. having exclusive groups is nice because it minimizes the number of node groups for a given node group size. > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.