Would be nice if someone could help out with this - it looks like a trivial question - but seems like some blocks are being lost for us when datanodes fail...
Varun On Fri, Apr 19, 2013 at 2:28 PM, Varun Sharma <va...@pinterest.com> wrote: > Hi, > > I had an instance where a datanode died while writing the block I am using > Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds. > > The block being written to, went into the UNDER_RECOVERY state looking at > the namenode logs and there were several internalRecoverLease() calls > because there were readers on that blcok. I had a couple of questions about > the code; > > 1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks > for each dataNodeDescriptor that holds the block. Then a recoverBlock call > is issued to each primary data node. What does the recoverBlock call do on > a datanode - does it sync the block on that node to other 2 data nodes. In > my case one of the data node is unreachable, what is the behaviour in such > a case ? > > 2) When a client wants to read a block which is "UNDER_RECOVERY" - do we > continue to suggest all 3 data nodes as replicas for reads or we pick the > one which is marked as primary for the block recovery ? > > Thanks >