We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
thanks dhruba On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi < ajit.ratnapar...@gmail.com> wrote: > Hi, > > We were planning to use it for past data archival(instead of moving it to > archival store). > Archiving it in HDFS gives advantage of making it easily available for > processing whenever required. > > Is there any archival solution in hadoop ecosystem? > > thanks, > Ajit. > > > On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote: > >> Hey Ajit, >> >> HDFS-RAID was never part of the 0.20 release. It made its debut in the >> 0.21 release [1]. I know that Facebook uses it (and also did develop >> it), but unsure of users beyond Facebook. >> >> While 0.21 overall is not entirely deemed as production-usable yet >> (and is in fact, possibly abandoned for efforts on 0.22+), you can >> give that release a whirl on a test cluster and see for yourself if >> your need beats the stability. >> >> Just curious though - why are you looking to use this specifically? >> >> [1] - >> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/ >> >> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi >> <ajit.ratnapar...@gmail.com> wrote: >> > Hi, >> > We want to use HDFS-RAID in our production cluster. >> > (http://wiki.apache.org/hadoop/HDFS-RAID) >> > I am not able to find source/binaries/configs for this in official >> hadoop >> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2). >> > Can somebody please tell me where can I find that? and installation >> > procedure? >> > Also, is HDFS-RAID implementation stable enough to use in production? >> > thanks, >> > Ajit. >> > >> >> >> >> -- >> Harsh J >> > > -- Connect to me at http://www.facebook.com/dhruba