> I will be very grateful to you if you merge and contribute it to Apache 
> Hadoop 0.20.2xx.x.

Hmm... I see what you mean. I was naive about what is "branch-20-warehouse". I 
was looking for an updated HDFS RAID that incorporated R-S coding but ran 
against a 20-ish HDFS. I suppose it is relatively easy to have a HDFS RAID 
close to what is in trunk if HDFS has evolved in your branch. :-)


It looks like the changes to HDFS can be teased apart as:

  - BlockMissingException

  - Listing file status and block locations: LocatedFileStatus, 
FileSystem.listLocatedStatus


  - Corrupt file reporting
     - Changes to FSNameSystem and UnderReplicatedBlocks for tracking and 
reporting corrupt blocks

     - Update to the ClientProtocol for listing corrupt file blocks: 
listCorruptFileBlocks()

     - DFSUtil.getCorruptFiles


  - Change visibility and constructor for datanode.BlockSender so RAID can send 
repaired blocks without needing to be a DataNode or without reimplementing the 
packet protocol


  - A set of quite invasive changes to the NameNode dealing with pluggable 
block placement policies, but RAID could possibly live without this, the 
PlacementMonitor would have more work to do in that case


I suppose the upside to any consideration for back porting all of this into an 
0.20.2xx is all of the above has already gone through trunk.


Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


>________________________________
>From: Dhruba Borthakur <dhr...@gmail.com>
>To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurt...@apache.org>
>Sent: Tuesday, September 20, 2011 9:49 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>Hi Andy,
>
>
>I will be very grateful to you if you merge and contribute it to Apache 
>Hadoop 0.20.2xx.x.
>
>
>thanks,
>dhruba
>
>
>On Tue, Sep 20, 2011 at 9:03 AM, Andrew Purtell <apurt...@apache.org> wrote:
>
>Hi Dhruba,
>>
>>Thanks for the pointer. I'm going to try and pull this code into our internal 
>>20-ish distro. Would you object if I make a contribution of that result if it 
>>is successful?
>>
>>
>>
>>Best regards,
>>
>>
>>    - Andy
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
>>Tom White)
>>
>>>________________________________
>>>From: Dhruba Borthakur <dhr...@gmail.com>
>>>To: Andrew Purtell <apurt...@apache.org>
>>>Cc: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
>>>Sent: Tuesday, September 20, 2011 2:18 AM
>>
>>>Subject: Re: Need help regarding HDFS-RAID
>>>
>>>
>>>Hi andy,
>>>
>>>
>>>we do run a version of HDFS RAID that is backported from Apache trunk to a 
>>>0.20 based release. Our code is 
>>>in https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
>>>But I do not have an elegant way to contribute this code to 
>>>Apache 0.20.2xx.x. 
>>>
>>>
>>>thanks,
>>>dhruba
>>>
>>>
>>>On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <apurt...@apache.org> wrote:
>>>
>>>Hi Dhruba,
>>>>
>>>>
>>>>Would you consider a contribution of this to branch-0.20-security 
>>>>aka 0.20.2xx.x?
>>>>
>>>>
>>>>If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 
>>>>0.20-ish platform, please disregard.
>>>>
>>>>
>>>>Best regards,
>>>>
>>>>
>>>>    - Andy
>>>>
>>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein 
>>>>(via Tom White)
>>>>
>>>>
>>>>>________________________________
>>>>>From: Dhruba Borthakur <dhr...@gmail.com>
>>>>>To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurt...@apache.org>
>>>>>Sent: Thursday, September 15, 2011 10:14 AM
>>>>>
>>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>>
>>>>>
>>>>>
>>>>>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is 
>>>>>pretty close to what is available in Apache hdfs trunk.
>>>>>
>>>>>
>>>>>-dhruba
>>>>>
>>>>>
>>>>>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <apurt...@apache.org> 
>>>>>wrote:
>>>>>
>>>>>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>>>>>
>>>>>> 
>>>>>>Best regards,
>>>>>>
>>>>>>
>>>>>>       - Andy
>>>>>>
>>>>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein 
>>>>>>(via Tom White)
>>>>>>
>>>>>>
>>>>>>>________________________________
>>>>>>>From: Dhruba Borthakur <dhr...@gmail.com>
>>>>>>>To: hdfs-user@hadoop.apache.org
>>>>>>>Sent: Thursday, September 15, 2011 10:06 AM
>>>>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using 
>>>>>>>XOR encoding (effective replication of 2.5). Data older than a few 
>>>>>>>months are raided using ReedSolomon (effective observed replication 
>>>>>>>factor of 1.5). This is running on our 60 PB size cluster for about an 
>>>>>>>year now.
>>>>>>>
>>>>>>>
>>>>>>>thanks
>>>>>>>dhruba
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi 
>>>>>>><ajit.ratnapar...@gmail.com> wrote:
>>>>>>>
>>>>>>>Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>We were planning to use it for past data archival(instead of moving it 
>>>>>>>>to archival store).
>>>>>>>>Archiving it in HDFS gives advantage of making it easily available for 
>>>>>>>>processing whenever required.
>>>>>>>>
>>>>>>>>
>>>>>>>>Is there any archival solution in hadoop ecosystem?
>>>>>>>>
>>>>>>>>
>>>>>>>>thanks,
>>>>>>>>Ajit.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>Hey Ajit,
>>>>>>>>>
>>>>>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>>>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>>>>>>>it), but unsure of users beyond Facebook.
>>>>>>>>>
>>>>>>>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>>>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>>>>>>>give that release a whirl on a test cluster and see for yourself if
>>>>>>>>>your need beats the stability.
>>>>>>>>>
>>>>>>>>>Just curious though - why are you looking to use this specifically?
>>>>>>>>>
>>>>>>>>>[1] - 
>>>>>>>>>http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>>>>>>>><ajit.ratnapar...@gmail.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> We want to use HDFS-RAID in our production cluster.
>>>>>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>>>>>>>> I am not able to find source/binaries/configs for this in official 
>>>>>>>>>> hadoop
>>>>>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>>>>>>>> Can somebody please tell me where can I find that? and installation
>>>>>>>>>> procedure?
>>>>>>>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>>>>>>>> thanks,
>>>>>>>>>> Ajit.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>--
>>>>>>>>>Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Connect to me at http://www.facebook.com/dhruba
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Connect to me at http://www.facebook.com/dhruba
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Connect to me at http://www.facebook.com/dhruba
>>>
>>>
>>> 
>>
>
>
>
>-- 
>Connect to me at http://www.facebook.com/dhruba
>
>
> 

Reply via email to