[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

Joydeep Sen Sarma (JIRA) Fri, 09 Jul 2010 18:42:24 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886976#action_12886976
 ]


Joydeep Sen Sarma commented on HDFS-1094:
-----------------------------------------

>  - Pick random rack r2 that is within R racks from r
>  - Pick random machine m2 in r2 that is within window [i, (i+M-1)%racksize]

 a few points:

- dangerous to choose physically contiguous racks for node groups (because of 
correlated failures in consecutive racks). may make things a lot worse.
- if rack numbering is based on some arithmetic (so that logically contiguous 
is not physically contiguous) - then one has to reason about what happens when 
new rack is added (i think it's ok to leave existing replicated data as is - 
but it's worth talking about this case. what would the rebalancer do in this 
case?)
- easy to reduce the overlap between node-groups (and thereby decrease loss 
probability):
  - instead of [i, (i+M-1)%racksize] - choose [ (i / (r/M))*r/M, (i / 
(r/M))*r/M + M-1]  // fixed offset groups of M nodes each in a rack

 i glanced through the Ceph algorithm 
(http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf) - it doesn't try to do what's 
described here. the number of placement groups is not controlled (and that is 
not an objective of the algo.). 

a close analog of the problem here is seen RAID arrays. When choosing to do 
parity + mirroring to tolerate multiple disk failures - one can choose to 
mirror RAID-parity groups or apply parity over RAID mirrored groups (1+5 vs. 
5+1). turns out 5+1 is a lot better from a data loss probability perspective. 
the reasoning and math are similar (both are susceptible to data loss on 4-disk 
failures - but in 5+1 - the 4-disk failures have to be contained within 2 
2-disk mirrored pairs. this is combinatorially much harder than the 1+5 case - 
where the the 4 disk failures have to cause 2 failures each in the N/2 node 
groups).


> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and 
> the other two replicas are on any two random nodes on a random remote rack. 
> This means that if any three datanodes die together, then there is a 
> non-trivial probability of losing at least one block in the cluster. This 
> JIRA is to discuss if there is a better algorithm that can lower probability 
> of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss

Reply via email to