On Fri, Feb 20, 2009 at 2:49 PM, Larry Compton <[email protected]>wrote:
> I'm having problems with my region servers dying. Region server and data > node log snippets are found below. Here's a synopsis of my configuration... > - 4 nodes > - Hadoop/Hbase 0.19.0 > - dfs.datanode.max.xcievers - 2048 > - dfs.datanode.socket.write.timeout - 0 > - file handle limit - 32768 > - fsck - healthy Thanks for reporting that you have above configured. What size table, regions and rows? Is the dfs.datanode.socket.write.timeout=0 set in a context that hbase can see it? i.e. is it in hbase-site or is it in hadoop-site and symlinked under the hbase/conf dir so hbase picks it up? Going by errors below, its absence could be explaination. Yours, St.Ack > > I'm seeing DataXceiver errors in the data node log, but not the sort that > indicates that the max.xcievers value is too small. Any idea what might be > wrong? > > HBASE REGION SERVER LOG OUTPUT... > 2009-02-20 08:50:42,476 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: java.net.SocketTimeoutException: 5000 millis timeout while > waiting for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/192.168.6.38:56737remote=/ > 192.168.6.38:50010] > at > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) > > 2009-02-20 08:50:42,918 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: java.net.SocketTimeoutException: 5000 millis timeout while > waiting for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/192.168.6.38:56646remote=/ > 192.168.6.38:50010] > at > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) > > 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_2604922956617757726_298427 bad datanode[0] > 192.168.6.38:50010 > 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_-3747640666687562371_298377 bad datanode[0] > 192.168.6.38:50010 > 2009-02-20 08:50:44,356 FATAL org.apache.hadoop.hbase.regionserver.HLog: > Could not append. Requesting close of log > java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting... > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) > 2009-02-20 08:50:44,357 ERROR > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split > failed for region > medline,_X2dX5031454eX3aX11f48751c5eX3aXX2dX725c,1235136902878 > java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting... > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) > 2009-02-20 08:50:44,377 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: > All > datanodes 192.168.6.38:50010 are bad. Aborting... > 2009-02-20 08:50:44,377 FATAL > org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with > ioe: > java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting... > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) > 2009-02-20 08:50:44,378 INFO org.apache.hadoop.hbase.regionserver.HRegion: > starting compaction on region medline,"blood",1235125955035 > 2009-02-20 08:50:44,380 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 6 on 60020, call batchUpdates([...@ecb0da, > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@14ed87c) from 192.168.6.29:47457 > : > error: java.io.IOException: All datanodes 192.168.6.38:50010 are bad. > Aborting... > java.io.IOException: All datanodes 192.168.6.38:50010 are bad. Aborting... > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) > 2009-02-20 08:50:44,418 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=2581, regions=71, stores=212, storefiles=352, > storefileIndexSize=31, > memcacheSize=574, usedHeap=1190, maxHeap=1984 > > DATANODE LOG OUTPUT... > 2009-02-20 08:50:45,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder > blk_-3747640666687562371_298377 0 Exception java.net.SocketException: > Broken > pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.DataOutputStream.writeLong(DataOutputStream.java:207) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:797) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820) > at java.lang.Thread.run(Thread.java:619) > > 2009-02-20 08:50:45,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for > block > blk_-3747640666687562371_298377 terminating > 2009-02-20 08:50:45,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_-3747640666687562371_298377 received exception java.io.EOFException: > while trying to read 32873 bytes > 2009-02-20 08:50:45,337 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder > blk_2604922956617757726_298427 0 Exception java.net.SocketException: Broken > pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:115) > at java.io.DataOutputStream.writeShort(DataOutputStream.java:150) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:798) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820) > at java.lang.Thread.run(Thread.java:619) > > 2009-02-20 08:50:45,338 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for > block > blk_2604922956617757726_298427 terminating > 2009-02-20 08:50:45,338 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_2604922956617757726_298427 received exception java.io.EOFException: > while trying to read 49299 bytes > 2009-02-20 08:50:45,342 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 192.168.6.38:50010, dest: /192.168.6.38:56791, bytes: 3318, op: HDFS_READ, > cliID: DFSClient_1697856093, srvID: > DS-697440498-192.168.6.38-50010-1233008986086, blockid: > blk_-4029959142608094898_296648 > 2009-02-20 08:50:46,680 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.6.38:50010, > storageID=DS-697440498-192.168.6.38-50010-1233008986086, > infoPort=50075, ipcPort=50020):DataXceiver > java.io.EOFException: while trying to read 32873 bytes > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102) > at java.lang.Thread.run(Thread.java:619) > 2009-02-20 08:50:46,680 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.6.38:50010, > storageID=DS-697440498-192.168.6.38-50010-1233008986086, > infoPort=50075, ipcPort=50020):DataXceiver > java.io.EOFException: while trying to read 49299 bytes > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102) > at java.lang.Thread.run(Thread.java:619) > > Larry >
