Frode Halvorsen created HDFS-7815:
-------------------------------------
Summary: Loop on 'blocks does not belong to any file'
Key: HDFS-7815
URL: https://issues.apache.org/jira/browse/HDFS-7815
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, namenode
Affects Versions: 2.6.0
Environment: small cluster on RetHat. 2 namenodes (HA), 6 datanodes
with 19TB disk for hdfs.
Reporter: Frode Halvorsen
I am currently experincing a looping situation;
The namenode uses appx 1:50 (min:sec) to log a massive amount of lines stating
that some blocks don't belong to any file. During this time, it's unresponsive
to any requests from datanodes, and if the zoo-keper had been running, it would
have taken the name-node down (ssh-fencing : kill).
When it has finished the 'round', it starts to do some normal work, and among
other things, telling the datanode to delete the blocks. But before the
datanode has gotten around to delete the blocks, and is about to report back to
the namenode, the namenode has stared on the next round of reporing the same
blocks that don't belong to anly file. Thus, the datanode gets a timout when
reporing block-updates for the deleted blocks, And this, of course repeats
itself over and over again...
There is actually two issues , I think,;
1- the namenode gets totally unresponsive when reporing the blocks (could this
be a debug-line instead of a INFO-line)
2 - the namenode seems to 'forget' that it has already reported those blocks
just 2-3 minutes ago...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)