I repeatedly have the following problem with 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for some data, the DFS can not find it, client hangs until timeout.
Grepping the cluster logs, I can see this: 1. at some time the DFS is asked to delete a block, blocks are deleted from the datanodes 2. some minutes later, a RS seems to ask for exactly this block...DFS says "Block blk_.. is not valid." and then "No live nodes contain current block". (I have xceivers and file desc limit high, dfs.datanode.handler.count=10, No particulary high load, 17 Servers with 24G/4Core) More log here: http://pastebin.com/cdqsy8Ae ? Thx, Al