Facebook contributed some code to do something similar called HDFS RAID: http://wiki.apache.org/hadoop/HDFS-RAID
-Joey On Jul 18, 2011, at 3:41, Da Zheng <zhengda1...@gmail.com> wrote: > Hello, > > It seems that data replication in HDFS is simply data copy among nodes. Has > anyone considered to use a better encoding to reduce the data size? Say, a > block > of data is split into N pieces, and as long as M pieces of data survive in the > network, we can regenerate original data. > > There are many benefits to reduce the data size. It can save network and disk > benefit, and thus reduce energy consumption. Computation power might be a > concern, but we can use GPU to encode and decode. > > But maybe the idea is stupid or it's hard to reduce the data size. I would > like > to hear your comments. > > Thanks, > Da