Re: HDFS and long-running processes

Todd Lipcon Tue, 21 Jul 2009 11:26:18 -0700

On Tue, Jul 21, 2009 at 3:26 AM, Steve Loughran <[email protected]> wrote:


> Todd Lipcon wrote:
>
>  On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch <[email protected]
>> >wrote:
>>
>>  Thanks, Todd.  Perhaps I was misinformed, or misunderstood.  I'll make
>>> sure I close files occasionally, but it's good to know that the only
>>> real issue is with data recovery after losing a node.
>>>
>>>
>> Just to be clear, there aren't issues with data recovery of
>> already-written
>> files. The issue is that, when you open a new file to write it, Hadoop
>> sets
>> up a pipeline that looks something like:
>>
>> Writer -> DN A -> DN B -> DN C
>>
>> Where each of DN [ABC] are datanodes in your HDFS cluster. If Writer is
>> also
>> a node in your HDFS cluster it will attempt to make DN A be the same
>> machine
>> as Writer.
>>
>> If DN B fails, the write pipeline will reorganize itself to:
>>
>> Writer -> DN A -> DN C
>>
>> In theory I *believe* it's supposed to pick up a new datanode at this
>> point
>> and tack it onto the end, but I'm not certain this is implemented quite
>> yet.
>> Maybe Dhruba or someone else with more knowledge here can chime in.
>>
>
> Sounds like a good opportunity for a fun little test -start the write on a
> 4DN (local) cluster, kill the DN in use, check that all is well
>

I have an internal ticket to write just such a test, just haven't had time
to finish it yet ;-) Volunteers welcome!

-Todd

Re: HDFS and long-running processes

Reply via email to