On Tue, Jul 21, 2009 at 3:26 AM, Steve Loughran <[email protected]> wrote:
> Todd Lipcon wrote: > > On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch <[email protected] >> >wrote: >> >> Thanks, Todd. Perhaps I was misinformed, or misunderstood. I'll make >>> sure I close files occasionally, but it's good to know that the only >>> real issue is with data recovery after losing a node. >>> >>> >> Just to be clear, there aren't issues with data recovery of >> already-written >> files. The issue is that, when you open a new file to write it, Hadoop >> sets >> up a pipeline that looks something like: >> >> Writer -> DN A -> DN B -> DN C >> >> Where each of DN [ABC] are datanodes in your HDFS cluster. If Writer is >> also >> a node in your HDFS cluster it will attempt to make DN A be the same >> machine >> as Writer. >> >> If DN B fails, the write pipeline will reorganize itself to: >> >> Writer -> DN A -> DN C >> >> In theory I *believe* it's supposed to pick up a new datanode at this >> point >> and tack it onto the end, but I'm not certain this is implemented quite >> yet. >> Maybe Dhruba or someone else with more knowledge here can chime in. >> > > Sounds like a good opportunity for a fun little test -start the write on a > 4DN (local) cluster, kill the DN in use, check that all is well > I have an internal ticket to write just such a test, just haven't had time to finish it yet ;-) Volunteers welcome! -Todd
