[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HDFS-503: ---------------------------------- Attachment: raid2.txt Incorporated a few review comments: 1. Make the underlying filesystem configurable (the default is till DistributedFileSystem) 2. The sample raid.xml lists the configuration properties that are exposed to the adminstrator. @Nicolas: I created a separate JIRA HDFS-600 to make the Parity generation algorithm pluggable. I will like to address it in a separate patch. This is going to play a critical part if we want to reduce the physical replication factor even more. @Andrew: I created HDFS-582 to implement a command line utility called fsckraid. It will periodically verify parity bits. @Raghu, you mentioned that "this only semi-transparent to the users since they have to use the new filesystem". In most cases, the cluster administrator sets the value of fs.hdfs.impl to DistributedRaidFileSystem, and no user and/or aplications need to change to use this raid feature.... that is what I meant by saying that this is "transparent" to the user. I also immensely like your idea of making the RaidNode fetch a list of corrupt blocks from the NN. As far as I know, such an API does not exist in the NN. I will open a new JIRA that retrieves a list of missing blocks from the NN. Thanks everybody for their review comments. > Implement erasure coding as a layer on HDFS > ------------------------------------------- > > Key: HDFS-503 > URL: https://issues.apache.org/jira/browse/HDFS-503 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: raid1.txt, raid2.txt > > > The goal of this JIRA is to discuss how the cost of raw storage for a HDFS > file system can be reduced. Keeping three copies of the same data is very > costly, especially when the size of storage is huge. One idea is to reduce > the replication factor and do erasure coding of a set of blocks so that the > over probability of failure of a block remains the same as before. > Many forms of error-correcting codes are available, see > http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has > described DiskReduce > https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. > My opinion is to discuss implementation strategies that are not part of base > HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.