Thanks looking into it, Todd, Am 09.04.2010 17:16, schrieb Todd Lipcon: > Hi, > > This is likely a multiple assignment bug. >
I tried again, this time I grep'ed for the the region that a client could not find. Locks like something with "mutliple assigment". http://pastebin.com/CHD0KSPH > Can you grep the NN log for the block ID 991235084167234271 ? This should > tell you which file it was originally allocated to, as well as what IP wrote > it. You should also see a deletion later. Also, the filename should give you > a clue as to which region the block is from. You can then consult those > particular RS and master logs to see which servers deleted the file and why. > PLS help; http://pastebin.com/zUxqyyfU (not sorted by time) I can only see that the Master adviced to delete.... (This error is a different instance of the same problem than the one above) Thx, Al > -Todd > > On Fri, Apr 9, 2010 at 12:56 AM, Al Lias <al.l...@gmx.de> wrote: > >> I repeatedly have the following problem with >> 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for >> some data, the DFS can not find it, client hangs until timeout. >> >> Grepping the cluster logs, I can see this: >> >> 1. at some time the DFS is asked to delete a block, blocks are deleted >> from the datanodes >> >> 2. some minutes later, a RS seems to ask for exactly this block...DFS >> says "Block blk_.. is not valid." and then "No live nodes contain >> current block". >> >> (I have xceivers and file desc limit high, >> dfs.datanode.handler.count=10, No particulary high load, 17 Servers with >> 24G/4Core) >> >> More log here: http://pastebin.com/cdqsy8Ae >> >> ? >> >> Thx, Al >> >> >> >> > >