[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-503:
----------------------------------

    Attachment: raid2.txt

Incorporated a few review comments:

1. Make the underlying filesystem configurable (the default is till 
DistributedFileSystem)
2. The sample raid.xml lists the configuration properties that are exposed to 
the adminstrator.

@Nicolas: I created a separate JIRA  HDFS-600 to make the Parity generation 
algorithm pluggable. I will like to address it in a separate patch. This is 
going to play a critical part if we want to reduce the physical replication 
factor even more.

@Andrew: I created HDFS-582 to implement a command line utility called  
fsckraid. It will periodically verify parity bits.

@Raghu, you mentioned that "this only semi-transparent to the users since they 
have to use the new filesystem". In most cases, the cluster administrator sets 
the value of fs.hdfs.impl to DistributedRaidFileSystem, and no user and/or 
aplications need to change to use this raid feature.... that is what I meant by 
saying that this is "transparent" to the user. I also immensely like your idea 
of making the RaidNode fetch a list of corrupt blocks from the NN. As far as I 
know, such an API does not exist in the NN. I will open a new JIRA that 
retrieves a list of missing blocks from the NN.

Thanks everybody for their review comments.

> Implement erasure coding as a layer on HDFS
> -------------------------------------------
>
>                 Key: HDFS-503
>                 URL: https://issues.apache.org/jira/browse/HDFS-503
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: raid1.txt, raid2.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to