[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889910#action_12889910 ]
Scott Carey commented on HDFS-1094: ----------------------------------- This needs to change "p" from a constant, to a function of the TTR window. "probablility of a single node failing" alone is meaningless, its concurrent failure that is the issue. The odds of concurrent node failure is linearly proportional to TTR. I think this model needs to assume one failure at odds = 1.0, then use the odds of concurrent failure for the next 2 failures within the time window. A 'constant' chance of failure begs the question, ".001 chance of failure per _what_?" The first failure happens, that is assumed. Then the next two happen given odds within a time window. Assuming hadoop failure replication is optimized (which it isn't, the DN dishes out block replication requests too slow). TTR is inversely proportional to the number of racks in a group for rack failure. TTR is inversely proportional to the number or racks in a group for single node failure _IF_ the combined bandwidth of the machines in the group in a rack is at least 2x than the between-rack bandwidth, otherwise it is inversely proportional to the ratio of rack bandwidth to node group bandwidth. The result is that only the "medium" sized groups above are viable, else it takes too long to get data replicated when a failure happens. Also, the TTR affects the odds of data loss on larger replication counts disproporionatly. > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.