[jira] Commented: (MAPREDUCE-1969) Allow raid to use Reed-Solomon erasure codes

dhruba borthakur (JIRA) Mon, 23 Aug 2010 19:24:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901709#action_12901709
 ]


dhruba borthakur commented on MAPREDUCE-1969:
---------------------------------------------

for all these proposals, the unwritten assumption is that all the blocks in a 
stripe belong to the same hdfs file. In that case, when the data file is 
deleted, the parity file can be deleted too.

> Allow raid to use Reed-Solomon erasure codes
> --------------------------------------------
>
>                 Key: MAPREDUCE-1969
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1969
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/raid
>            Reporter: Scott Chen
>             Fix For: 0.22.0
>
>
> Currently raid uses one parity block per stripe which corrects one missing 
> block on one stripe.
> Using Reed-Solomon code, we can add any number of parity blocks to tolerate 
> more missing blocks.
> This way we can get a good file corrupt probability even if we set the 
> replication to 1.
> Here are some simple comparisons:
> 1. No raid, replication = 3:
> File corruption probability = O(p^3), Storage space = 3x
> 2. Single parity raid with stripe size = 10, replication = 2:
> File corruption probability = O(p^4), Storage space = 2.2x 
> 3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = 
> 1:
> File corruption probability = O(p^5), Storage space = 1.4x
> where p is the missing block probability.
> Reed-Solomon code can save lots of space without compromising the corruption 
> probability.
> To achieve this, we need some changes to raid:
> 1. Add a block placement policy that knows about raid logic and do not put 
> blocks on the same stripe on the same node.
> 2. Add an automatic block fixing mechanism. The block fixing will replace the 
> replication of under replicated blocks.
> 3. Allow raid to use general erasure code. It is now hard coded using Xor.
> 4. Add a Reed-Solomon code implementation
> We are planing to use it on the older data only.
> Because setting replication = 1 hurts the data locality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1969) Allow raid to use Reed-Solomon erasure codes

Reply via email to