Hi, We have already thoughts about it.
Looks like you are talking about this features right https://issues.apache.org/jira/browse/HDFS-1640 https://issues.apache.org/jira/browse/HDFS-2115 but implementation not yet ready in trunk Regards, Uma ****************************************************************************************** This email and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained here in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it! ***************************************************************************************** ----- Original Message ----- From: Da Zheng <zhengda1...@gmail.com> Date: Tuesday, July 19, 2011 9:23 am Subject: Re: replicate data in HDFS with smarter encoding To: common-u...@hadoop.apache.org Cc: Joey Echeverria <j...@cloudera.com>, "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org> > So this kind of feature is desired by the community? > > It seems this implementation can only reduce the data size on the > disk > by the background daemon RaidNode, but it cannot reduce the disk > bandwidth and network bandwidth when the client writes data to > HDFS. It > might be more interesting to reduce the disk bandwidth and network > bandwidth although it might require to modify the implementation of > the > pipeline in HDFS. > > Thanks, > Da > > > On 07/18/11 04:10, Joey Echeverria wrote: > > Facebook contributed some code to do something similar called > HDFS RAID: > > > > http://wiki.apache.org/hadoop/HDFS-RAID > > > > -Joey > > > > > > On Jul 18, 2011, at 3:41, Da Zheng<zhengda1...@gmail.com> wrote: > > > >> Hello, > >> > >> It seems that data replication in HDFS is simply data copy among > nodes. Has > >> anyone considered to use a better encoding to reduce the data > size? Say, a block > >> of data is split into N pieces, and as long as M pieces of data > survive in the > >> network, we can regenerate original data. > >> > >> There are many benefits to reduce the data size. It can save > network and disk > >> benefit, and thus reduce energy consumption. Computation power > might be a > >> concern, but we can use GPU to encode and decode. > >> > >> But maybe the idea is stupid or it's hard to reduce the data > size. I would like > >> to hear your comments. > >> > >> Thanks, > >> Da > >