Karl Kleinpaste wrote:
On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote:
The Datanode's use multiple threads with locking and one of the
assumptions is that the block report (1ce per hour by default) takes
little time. The datanode will pause while the block report is running
and if it happens to take a while weird things start to happen.

Thank you for responding, this is very informative for us.

Having looked through the source code with a co-worker regarding
periodic scan and then checking the logs once again, we find that we are
finding reports of this sort:

BlockReport of 1158499 blocks got processed in 308860 msecs
BlockReport of 1159840 blocks got processed in 237925 msecs
BlockReport of 1161274 blocks got processed in 177853 msecs
BlockReport of 1162408 blocks got processed in 285094 msecs
BlockReport of 1164194 blocks got processed in 184478 msecs
BlockReport of 1165673 blocks got processed in 226401 msecs

The 3rd of these exactly straddles the particular example timeline I
discussed in my original email about this question.  I suspect I'll find
more of the same as I look through other related errors.

You could ask for "complete fix" in https://issues.apache.org/jira/browse/HADOOP-4584 . I don't think current patch there fixes your problem.

Raghu.

--karl


Reply via email to