[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889976#action_12889976 ]
Rodrigo Schmidt commented on HDFS-1094: --------------------------------------- Hi Scott, I totally understand your concerns. BTW, thanks for the thorough analysis you gave. That was impressive! I would say that the main problem is the complexity of the calculations. It was already hard to calculate things with this simplified model. Adding more variables would be hard. Besides, I think we are more interested in permanent failures -- those that cannot be recovered --, since we are trying to reduce the odds of permanently losing data (in an unrecoverable way). I'm not saying we should not use TTR. I'm just saying that we didn't aim for that in our evaluation. I guess the best thing we wanted to take from these numbers was the comparison between the different algorithms, and whether it was worth changing the block placement policy or not. I think this is a great discussion JIRA and I'm really happy with all the great ideas people have been giving. Having said that, it's clear to me that there is no definite solution to the problem. Depending on how you approach it, different policies will be the optimal ones. For instance, take DEFAULT, RING, and DISJOINT. They are all great and valid approaches, each one with its ups and downs. I believe the main result of this JIRA is that people will start creating different probability models and algorithms depending on their use cases, and we will see several block placement policies coming out of it. > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.