[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886861#action_12886861 ]
Joydeep Sen Sarma commented on HDFS-1094: ----------------------------------------- one can't have a different node-group for each block/file. that would defeat the whole point. (in fact - every block today is in a 3-node node-group - and there are gazillions of such node groups that overlap). the reduction in data loss probability comes out of the fact that the odds of 3 nodes falling into the same node-group is small. (if they don't fall into the same node-group - there's no data loss). if the number of node groups is very large (because of overlaps) - then the probability of 3 failing nodes falling into the same node group will start going up (just because there are more node-groups to choose from). the more the node-groups are exclusive - the better. that means the number of node-groups is minimized wrt. a constant number of nodes. as i mentioned - the size of the node-group is dictated to some extent by re-replication bandwidth. one wants very small node groups - but that doesn't work because there's not enough re-replication bandwidth (a familiar problem in RAID). if u take some standard cluster (say 8 racks x 40 nodes) - how many distinct node groups would ur algorithm end up with? > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.