Hi, This is likely a multiple assignment bug.
Can you grep the NN log for the block ID 991235084167234271 ? This should tell you which file it was originally allocated to, as well as what IP wrote it. You should also see a deletion later. Also, the filename should give you a clue as to which region the block is from. You can then consult those particular RS and master logs to see which servers deleted the file and why. -Todd On Fri, Apr 9, 2010 at 12:56 AM, Al Lias <al.l...@gmx.de> wrote: > I repeatedly have the following problem with > 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for > some data, the DFS can not find it, client hangs until timeout. > > Grepping the cluster logs, I can see this: > > 1. at some time the DFS is asked to delete a block, blocks are deleted > from the datanodes > > 2. some minutes later, a RS seems to ask for exactly this block...DFS > says "Block blk_.. is not valid." and then "No live nodes contain > current block". > > (I have xceivers and file desc limit high, > dfs.datanode.handler.count=10, No particulary high load, 17 Servers with > 24G/4Core) > > More log here: http://pastebin.com/cdqsy8Ae > > ? > > Thx, Al > > > > -- Todd Lipcon Software Engineer, Cloudera