Thanks for the info, Stack.

The same problem happened again, this time on a different region server. Log
snippets are found below. I've included the JVM info from the region server
log. All of the region servers and the master of configured identically.
Here's a more complete synopsis:
- 4 nodes
- Hadoop/Hbase 0.19.0
- dfs.datanode.max.xcievers - 2048
- dfs.datanode.socket.write.timeout - 0
- file handle limit - 32768
- fsck - healthy
- "hadoop-site.xml" symlinked into the Hbase "conf" directory on all four
nodes (previous run did this via HBASE_CLASSPATH)
- number of regions - 302
- size of table - 93.6GB (not sure about number of rows, but I'll run an MR
job if it's needed)

HBASE REGION SERVER SNIPPET
Wed Feb 25 18:00:17 EST 2009 Starting regionserver on blackbook8
ulimit -n 32768
2009-02-25 18:00:18,717 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
vmInputArguments=[-Xmx2000m, -XX:+HeapDumpOnOutOfMemoryError,
-Dhbase.log.dir=/home/hadoop/pkg/hbase/bin/../logs,
-Dhbase.log.file=hbase-hadoop-regionserver-blackbook8.log,
-Dhbase.home.dir=/home/hadoop/pkg/hbase/bin/.., -Dhbase.id.str=hadoop,
-Dhbase.root.logger=INFO,DRFA,
-Djava.library.path=/home/hadoop/pkg/hbase/bin/../lib/native/Linux-i386-32]
2009-02-25 18:00:18,830 INFO
org.apache.hadoop.hbase.regionserver.MemcacheFlusher:
globalMemcacheLimit=793.9m, globalMemcacheLimitLowMark=496.2m, maxHeap=1.9g
2009-02-25 18:00:18,836 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms

...

2009-02-26 11:49:25,997 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://blackbook38:55310/hbase/log_192.168.6.8_1235602818902_60020/hlog.dat.1235666927118,
entries=100001. New log writer:
/hbase/log_192.168.6.8_1235602818902_60020/hlog.dat.1235666965925
2009-02-26 11:49:42,631 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 5000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.6.8:52614 remote=/
192.168.6.8:50010]
    at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
    at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
    at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-02-26 11:49:42,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-4369885596245304704_352361 bad datanode[0]
192.168.6.8:50010
2009-02-26 11:49:43,902 FATAL org.apache.hadoop.hbase.regionserver.HLog:
Could not append. Requesting close of log
java.io.IOException: All datanodes 192.168.6.8:50010 are bad. Aborting...
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-26 11:49:44,185 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: All
datanodes 192.168.6.8:50010 are bad. Aborting...
2009-02-26 11:49:44,186 FATAL
org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with ioe:

java.io.IOException: All datanodes 192.168.6.8:50010 are bad. Aborting...
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-02-26 11:49:45,497 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=3635, regions=75, stores=225, storefiles=381, storefileIndexSize=35,
memcacheSize=284, usedHeap=697, maxHeap=1984
2009-02-26 11:49:45,497 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
LogRoller exiting.
2009-02-26 11:49:45,538 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020, call batchUpdates([...@193baab,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@1c43576) from 192.168.6.29:45312:
error: java.io.IOException: All datanodes 192.168.6.8:50010 are bad.
Aborting...
java.io.IOException: All datanodes 192.168.6.8:50010 are bad. Aborting...
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)

DATA NODE:
2009-02-26 11:49:44,389 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-4369885596245304704_352361 0 Exception java.net.SocketException: Broken
pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:115)
        at java.io.DataOutputStream.writeShort(DataOutputStream.java:150)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:798)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
        at java.lang.Thread.run(Thread.java:619)

2009-02-26 11:49:44,390 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4369885596245304704_352361 terminating
2009-02-26 11:49:44,390 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-4369885596245304704_352361 received exception java.io.EOFException:
while trying to read 32873 bytes
2009-02-26 11:49:45,097 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.6.8:50010, storageID=DS-1828418559-192.168.6.8-50010-1233008401196,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 32873 bytes
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
        at java.lang.Thread.run(Thread.java:619)

Reply via email to