[ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889976#action_12889976
 ] 

Rodrigo Schmidt commented on HDFS-1094:
---------------------------------------

Hi Scott,

I totally understand your concerns. BTW, thanks for the thorough analysis you 
gave. That was impressive!

I would say that the main problem is the complexity of the calculations. It was 
already hard to calculate things with this simplified model. Adding more 
variables would be hard. Besides, I think we are more interested in permanent 
failures -- those that cannot be recovered --, since we are trying to reduce 
the odds of permanently losing data (in an unrecoverable way).

I'm not saying we should not use TTR. I'm just saying that we didn't aim for 
that in our evaluation.

I guess the best thing we wanted to take from these numbers was the comparison 
between the different algorithms, and whether it was worth changing the block 
placement policy or not. 

I think this is a great discussion JIRA and I'm really happy with all the great 
ideas people have been giving. Having said that, it's clear to me that there is 
no definite solution to the problem. Depending on how you approach it, 
different policies will be the optimal ones. For instance, take DEFAULT, RING, 
and DISJOINT. They are all great and valid approaches, each one with its ups 
and downs. I believe the main result of this JIRA is that people will start 
creating different probability models and algorithms depending on their use 
cases, and we will see several block placement policies coming out of it.


> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: calculate_probs.py, failure_rate.py, prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to