I found this in regionserver log on one machine - the region server shutdown shortly after:
2010-03-05 23:44:37,859 WARN [DataStreamer for file /hbase/.logs/ snv-it-lin-010.projectrialto.com,60020,1267695848448/hlog.dat.1267860383622] hdfs.DFSClient$DFSOutputStream(2589): Error Recovery for block blk_6820048136829787576_478281 failed because recovery from primary datanode 10.10.31.135:50010 failed 5 times. Pipeline was 10.10.31.135:50010. Will retry... 2010-03-05 23:44:38,865 WARN [DataStreamer for file /hbase/.logs/ snv-it-lin-010.projectrialto.com,60020,1267695848448/hlog.dat.1267860383622] hdfs.DFSClient$DFSOutputStream(2583): Error Recovery for block blk_6820048136829787576_478281 failed because recovery from primary datanode 10.10.31.135:50010 failed 6 times. Pipeline was 10.10.31.135:50010. Aborting... 2010-03-05 23:44:38,866 ERROR [regionserver/10.10.31.135:60020] regionserver.HRegionServer(631): Unable to close log in abort java.io.IOException: Error Recovery for block blk_6820048136829787576_478281 failed because recovery from primary datanode 10.10.31.135:50010 failed 6 times. Pipeline was 10.10.31.135:50010. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2584) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2078) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2241) 2010-03-05 23:44:38,866 DEBUG [regionserver/10.10.31.135:60020] regionserver.HRegionServer(1669): closing region ignoreTable,com.india-forums.www/forum_posts.asp?TID=1274472,1266968531214 2010-03-05 23:44:38,866 DEBUG [regionserver/10.10.31.135:60020] regionserver.HRegion(453): Closing ignoreTable,com.india-forums.www\x2Fforum_posts.asp\x3FTID\x3D1274472,1266968531214: compactions & flushes disabled 2010-03-05 23:44:38,867 DEBUG [regionserver/10.10.31.135:60020] regionserver.HRegion(470): Updates disabled for region, no outstanding scanners on ignoreTable,com.india-forums.www\x2Fforum_posts.asp\x3FTID\x3D1274472,1266968531214 Here is result from 'fsck /hbase': ... /hbase/domaincrawltable/116384076/txt/4886186747089330505: Under replicated blk_7285175333095642722_478442. Target Replicas is 3 but found 2 replica(s). ....................................................... .....................................................................................Status: HEALTHY Total size: 13749366171 B Total dirs: 275 Total files: 285 (Files currently being written: 2) Total blocks (validated): 417 (avg. block size 32972101 B) Minimally replicated blocks: 417 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 11 (2.6378896 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9736211 Corrupt blocks: 0 Missing replicas: 11 (0.88709676 %) Number of data-nodes: 3 Number of racks: 1 If you can shed some light on how this might happen or resolution method, that would be great. Here is excerpt of log from datanode 10.10.31.135: 2010-03-05 23:41:52,333 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.31.135:50010, dest: /10.10.31.135:43906, bytes: 5011, op: HDFS_READ, cliID: DFSClient_-854338598, srvID: DS-1802582900-10.10.30.104-50010-1249540398456, blockid: blk_-8220999320966573627_478276 2010-03-05 23:44:26,112 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_REGISTER 2010-03-05 23:44:26,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_6820048136829787576_478281 java.nio.channels.ClosedByInterruptException 2010-03-05 23:44:26,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6820048136829787576_478281 received exception java.io.IOException: Interrupted receiveBlock 2010-03-05 23:44:26,795 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.10.31.135:50010, storageID=DS-1802582900-10.10.30.104-50010-1249540398456, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Interrupted receiveBlock at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:569) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2010-03-05 23:44:26,796 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6820048136829787576_478281 1 Exception java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:162) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853) at java.lang.Thread.run(Thread.java:619) 2010-03-05 23:44:26,796 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6820048136829787576_478281 1 : Thread is interrupted. 2010-03-05 23:44:26,796 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_6820048136829787576_478281 terminating 2010-03-05 23:44:26,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_6820048136829787576_478416 of size 19767296 as part of lease recovery. 2010-03-05 23:44:28,054 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-197353992341078477_457049 2010-03-05 23:44:30,798 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 80261 blocks got processed in 4000 msecs 2010-03-05 23:44:32,814 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8837772549203757719_478317 src: /10.10.31.137:40420 dest: / 10.10.31.135:50010 2010-03-05 23:44:32,815 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-9195341056633376217_478409 src: /10.10.31.136:38576 dest: / 10.10.31.135:50010 2010-03-05 23:44:32,814 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8837772549203757719_478317 src: /10.10.31.137:40420 dest: / 10.10.31.135:50010 2010-03-05 23:44:32,815 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-9195341056633376217_478409 src: /10.10.31.136:38576 dest: / 10.10.31.135:50010 2010-03-05 23:44:32,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-9195341056633376217_478409 src: /10.10.31.136:38576 dest: / 10.10.31.135:50010 of size 387 2010-03-05 23:44:32,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-8837772549203757719_478317 src: /10.10.31.137:40420 dest: / 10.10.31.135:50010 of size 25429 2010-03-05 23:44:33,807 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-8961344666669238364_478288 file /disk3/opt/kindsight/hadoop/data/dfs/data/current/subdir22/subdir59/blk_-8961344666669238364 2010-03-05 23:44:33,807 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-6720040806211326547_478305 file /disk4/opt/kindsight/hadoop/data/dfs/data/current/subdir56/subdir5/blk_-6720040806211326547 2010-03-05 23:44:33,825 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-977571326835424573_478298 file /opt/kindsight/hadoop/data/dfs/data/current/subdir61/subdir26/blk_-977571326835424573 2010-03-05 23:44:33,825 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-878425890325495654_478304 file /disk3/opt/kindsight/hadoop/data/dfs/data/current/subdir22/subdir47/blk_-878425890325495654 2010-03-05 23:44:33,826 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_656787642109994178_478290 file /opt/kindsight/hadoop/data/dfs/data/current/subdir61/subdir26/blk_656787642109994178 2010-03-05 23:44:33,826 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_1073497695673238763_478300 file /disk3/opt/kindsight/hadoop/data/dfs/data/current/subdir22/subdir47/blk_1073497695673238763 2010-03-05 23:44:33,827 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_6820048136829787576_478281, targets=[ 10.10.31.135:50010]) 2010-03-05 23:44:33,829 INFO org.apache.hadoop.ipc.Server: IPC Server handler 13 on 50020, call recoverBlock(blk_6820048136829787576_478281, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@44355f02) from 10.10.31.135:52441: error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: blk_6820048136829787576_478281 is already commited, storedBlock == null. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4676) at org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) org.apache.hadoop.ipc.RemoteException: java.io.IOException: blk_6820048136829787576_478281 is already commited, storedBlock == null. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4676) at org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:739) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.nextGenerationStamp(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1550) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1524) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1590) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2010-03-05 23:44:33,831 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_1345490507338325022_478299 file /disk2/opt/kindsight/hadoop/data/dfs/data/current/subdir5/subdir57/blk_1345490507338325022 2010-03-05 23:44:33,843 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_7963383895657483589_478296 file /disk3/opt/kindsight/hadoop/data/dfs/data/current/subdir22/subdir17/blk_7963383895657483589 2010-03-05 23:44:34,835 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_6820048136829787576_478281, targets=[ 10.10.31.135:50010]) 2010-03-05 23:44:34,837 INFO org.apache.hadoop.ipc.Server: IPC Server handler 15 on 50020, call recoverBlock(blk_6820048136829787576_478281, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@25bd85b5) from 10.10.31.135:52441: error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block (=blk_6820048136829787576_478281) not found at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1897) at org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:481) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block (=blk_6820048136829787576_478281) not found at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1897) at org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:481) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:739) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.commitBlockSynchronization(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1543) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1524) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1590) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2010-03-05 23:44:35,781 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5453101949124244660_478413 src: /10.10.31.136:38577 dest: / 10.10.31.135:50010 2010-03-05 23:44:35,782 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-5453101949124244660_478413 src: /10.10.31.136:38577 dest: / 10.10.31.135:50010 of size 4133 2010-03-05 23:44:35,783 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5148454955563266670_478361 src: /10.10.31.136:38578 dest: / 10.10.31.135:50010 2010-03-05 23:44:35,784 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_-5148454955563266670_478361 src: /10.10.31.136:38578 dest: / 10.10.31.135:50010 of size 121 2010-03-05 23:44:35,846 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_6820048136829787576_478281, targets=[ 10.10.31.135:50010]) 2010-03-05 23:44:35,847 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 50020, call recoverBlock(blk_6820048136829787576_478281, false, [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@423c489e) from 10.10.31.135:52441: error: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block (=blk_6820048136829787576_478281) not found