Hi, I had an instance where a datanode died while writing the block I am using Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds.
The block being written to, went into the UNDER_RECOVERY state looking at the namenode logs and there were several internalRecoverLease() calls because there were readers on that blcok. I had a couple of questions about the code; 1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks for each dataNodeDescriptor that holds the block. Then a recoverBlock call is issued to each primary data node. What does the recoverBlock call do on a datanode - does it sync the block on that node to other 2 data nodes. In my case one of the data node is unreachable, what is the behaviour in such a case ? 2) When a client wants to read a block which is "UNDER_RECOVERY" - do we continue to suggest all 3 data nodes as replicas for reads or we pick the one which is marked as primary for the block recovery ? Thanks