Re: replicate data in HDFS with smarter encoding

Uma Maheswara Rao G 72686 Mon, 18 Jul 2011 21:44:19 -0700

Hi,

We have already thoughts about it.


Looks like you are talking about this features right
https://issues.apache.org/jira/browse/HDFS-1640
https://issues.apache.org/jira/browse/HDFS-2115

but implementation not yet ready in trunk


Regards,
Uma
******************************************************************************************
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained here in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
email in error, please notify the sender by phone or email immediately and 
delete it!
 
*****************************************************************************************

----- Original Message -----
From: Da Zheng <zhengda1...@gmail.com>
Date: Tuesday, July 19, 2011 9:23 am
Subject: Re: replicate data in HDFS with smarter encoding
To: common-u...@hadoop.apache.org
Cc: Joey Echeverria <j...@cloudera.com>, "hdfs-user@hadoop.apache.org" 
<hdfs-user@hadoop.apache.org>

> So this kind of feature is desired by the community?
> 
> It seems this implementation can only reduce the data size on the 
> disk 
> by the background daemon RaidNode, but it cannot reduce the disk 
> bandwidth and network bandwidth when the client writes data to 
> HDFS. It 
> might be more interesting to reduce the disk bandwidth and network 
> bandwidth although it might require to modify the implementation of 
> the 
> pipeline in HDFS.
> 
> Thanks,
> Da
> 
> 
> On 07/18/11 04:10, Joey Echeverria wrote:
> > Facebook contributed some code to do something similar called 
> HDFS RAID:
> >
> > http://wiki.apache.org/hadoop/HDFS-RAID
> >
> > -Joey
> >
> >
> > On Jul 18, 2011, at 3:41, Da Zheng<zhengda1...@gmail.com>  wrote:
> >
> >> Hello,
> >>
> >> It seems that data replication in HDFS is simply data copy among 
> nodes. Has
> >> anyone considered to use a better encoding to reduce the data 
> size? Say, a block
> >> of data is split into N pieces, and as long as M pieces of data 
> survive in the
> >> network, we can regenerate original data.
> >>
> >> There are many benefits to reduce the data size. It can save 
> network and disk
> >> benefit, and thus reduce energy consumption. Computation power 
> might be a
> >> concern, but we can use GPU to encode and decode.
> >>
> >> But maybe the idea is stupid or it's hard to reduce the data 
> size. I would like
> >> to hear your comments.
> >>
> >> Thanks,
> >> Da
> 
>

Re: replicate data in HDFS with smarter encoding

Reply via email to