Hi, We encountered a strange situation when restarting NameNode: it can not leave safe mode automatically. "The ratio of reported blocks 0.9986 has not reached the threshold 0.999". Our cluster has totally 83,276,820 blocks. So, if the counter is right, we are missing about 116,587 blocks. But fsck reported 83,276,779 blocks were healthy and 37 blocks in open files. Only 4 blocks were marked as corrupt because its length is shorter than existing ones. If the fsck result is believable, we got ratio higher than 0.999999 and the threshold was reached.
I think maybe the counter of blockSafe didn't function accurately. Is that possible? Our case is similar to the situation described in jira: https://issues.apache.org/jira/browse/HADOOP-2159 (our Hadoop release already included this patch). Any suggestions? Wei