The "Verification succeeded" messages are from a Datanode background housekeeping task, DataBlockScanner, which attempts to discover any replicas that have become corrupt. If it finds one (which should be rare), it tells the Namenode the replica has become corrupted, and the NN will re-replicate it from a good copy on another DN.
DataBlockScanner may consume up to 100% of one CPU core on the DN, but no more. It is very unlikely to have caused the DN to become unable to do its high-priority work, like sending heartbeats and responding to Clients. Unless you're running DN on single-core boxes, look to network problems or Namenode overload as more likely explanations for the problem. One other possibility: were the "lost heartbeat" logs from startup time of a large cluster? In v20, prior to a set of startup performance improvements that a few of us did over the first few months of this year, it was not uncommon for the NN to get swamped during startup of a large cluster, and start losing heartbeats and removing healthy nodes. This was directly addressed in trunk and 20-security by HDFS-1541 (patch by Hairong Kuang). --Matt On May 31, 2011, at 4:10 AM, Joey Echeverria wrote: How much memory do you have on your DataNode? Is it possible that you're swapping? -Joey On Mon, May 30, 2011 at 11:09 PM, ccxixicc <ccxix...@foxmail.com> wrote: > > Hi,all > I found NameNode often lost heartbeat from DataNodes: > org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost > heartbeat from 192.168.1.101:50010 > org.apache.hadoop.net.NetworkTopology: Removing a node: > /default-rack/102.168.1.101:50010 > > meanwhile NN logs: > org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > blockMap updated: 192.168.1.102:50010 is added to blk_16634224072... > > And DN logs: > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_1820616086.. > > There's no DFSClients, I do nothing, What are the NN and DN doing? Almost > 100% cpu. Is this why NN lost heartbeat from DN? > > Thanks. > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434