Actually, I spoke too soon. In "hbase-env.xml", we have "HBASE_CLASSPATH"
set to include the Hadoop conf directory on all 4 nodes, so the Hbase
servers should have access to all of the Hadoop parameters. I'm going to try
a symlink to "hadoop-site.xml" and see if the behavior changes.

Larry

On Wed, Feb 25, 2009 at 5:33 PM, Larry Compton
<[email protected]>wrote:

> "dfs.datanode.socket.write.timeout" is set in "hadoop-site.xml" and isn't
> linked or contained in the Hbase "conf" directory. I'll try that out. I'm
> not sure I understand why this is necessary, though. It seems like this
> parameter would only matter to Hadoop, so why is it necessary for the Hbase
> servers to have access to it?
>
> Also, I've been looking at the Hbase Wiki and also at the content stored in
> my Hbase directory in HDFS. I can easily get the size in bytes of my table
> using "hadoop fs -dus", but I don't know how to get the number of regions.
> Are the regions the subdirectories directly beneath the table directory?
> Also, what's a fast way to find out the number of rows? I've been trying to
> use "count" in "hbase shell", but I keep getting scanner timeouts.
>
>
> On Sat, Feb 21, 2009 at 12:45 AM, stack <[email protected]> wrote:
>
>> On Fri, Feb 20, 2009 at 2:49 PM, Larry Compton
>> <[email protected]>wrote:
>>
>> > I'm having problems with my region servers dying. Region server and data
>> > node log snippets are found below. Here's a synopsis of my
>> configuration...
>> > - 4 nodes
>> > - Hadoop/Hbase 0.19.0
>> > - dfs.datanode.max.xcievers - 2048
>> > - dfs.datanode.socket.write.timeout - 0
>> > - file handle limit - 32768
>> > - fsck - healthy
>>
>>
>> Thanks for reporting that you have above configured.  What size table,
>> regions and rows?
>>
>> Is the dfs.datanode.socket.write.timeout=0 set in a context that hbase can
>> see it?  i.e. is it in hbase-site or is it in hadoop-site and symlinked
>> under the hbase/conf dir so hbase picks it up?  Going by errors below, its
>> absence could be explaination.
>>
>> Yours,
>> St.Ack
>>
>>
>> >
>> > I'm seeing DataXceiver errors in the data node log, but not the sort
>> that
>> > indicates that the max.xcievers value is too small. Any idea what might
>> be
>> > wrong?
>> >
>> > HBASE REGION SERVER LOG OUTPUT...
>> > 2009-02-20 08:50:42,476 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>> > Exception: java.net.SocketTimeoutException: 5000 millis timeout while
>> > waiting for channel to be ready for write. ch :
>> > java.nio.channels.SocketChannel[connected local=/192.168.6.38:56737
>> remote=/
>> > 192.168.6.38:50010]
>> >        at
>> >
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>> >        at
>> >
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>> >        at
>> >
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>> >        at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>> >        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>> >
>> > 2009-02-20 08:50:42,918 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>> > Exception: java.net.SocketTimeoutException: 5000 millis timeout while
>> > waiting for channel to be ready for write. ch :
>> > java.nio.channels.SocketChannel[connected local=/192.168.6.38:56646
>> remote=/
>> > 192.168.6.38:50010]
>> >        at
>> >
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
>> >        at
>> >
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>> >        at
>> >
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>> >        at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>> >        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)
>> >
>> > 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block blk_2604922956617757726_298427 bad datanode[0]
>> > 192.168.6.38:50010
>> > 2009-02-20 08:50:43,023 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block blk_-3747640666687562371_298377 bad datanode[0]
>> > 192.168.6.38:50010
>> > 2009-02-20 08:50:44,356 FATAL org.apache.hadoop.hbase.regionserver.HLog:
>> > Could not append. Requesting close of log
>> > java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
>> Aborting...
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
>> > 2009-02-20 08:50:44,357 ERROR
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>> Compaction/Split
>> > failed for region
>> > medline,_X2dX5031454eX3aX11f48751c5eX3aXX2dX725c,1235136902878
>> > java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
>> Aborting...
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
>> > 2009-02-20 08:50:44,377 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException:
>> > All
>> > datanodes 192.168.6.38:50010 are bad. Aborting...
>> > 2009-02-20 08:50:44,377 FATAL
>> > org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with
>> > ioe:
>> > java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
>> Aborting...
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
>> > 2009-02-20 08:50:44,378 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > starting  compaction on region medline,"blood",1235125955035
>> > 2009-02-20 08:50:44,380 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 6 on 60020, call batchUpdates([...@ecb0da,
>> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@14ed87c) from
>> 192.168.6.29:47457
>> > :
>> > error: java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
>> > Aborting...
>> > java.io.IOException: All datanodes 192.168.6.38:50010 are bad.
>> Aborting...
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2442)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1997)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
>> > 2009-02-20 08:50:44,418 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> > request=2581, regions=71, stores=212, storefiles=352,
>> > storefileIndexSize=31,
>> > memcacheSize=574, usedHeap=1190, maxHeap=1984
>> >
>> > DATANODE LOG OUTPUT...
>> > 2009-02-20 08:50:45,337 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
>> > blk_-3747640666687562371_298377 0 Exception java.net.SocketException:
>> > Broken
>> > pipe
>> >        at java.net.SocketOutputStream.socketWrite0(Native Method)
>> >        at
>> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >        at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:797)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > 2009-02-20 08:50:45,337 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
>> > block
>> > blk_-3747640666687562371_298377 terminating
>> > 2009-02-20 08:50:45,337 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> > blk_-3747640666687562371_298377 received exception java.io.EOFException:
>> > while trying to read 32873 bytes
>> > 2009-02-20 08:50:45,337 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
>> > blk_2604922956617757726_298427 0 Exception java.net.SocketException:
>> Broken
>> > pipe
>> >        at java.net.SocketOutputStream.socketWrite0(Native Method)
>> >        at
>> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >        at java.net.SocketOutputStream.write(SocketOutputStream.java:115)
>> >        at java.io.DataOutputStream.writeShort(DataOutputStream.java:150)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:798)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:820)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > 2009-02-20 08:50:45,338 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
>> > block
>> > blk_2604922956617757726_298427 terminating
>> > 2009-02-20 08:50:45,338 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> > blk_2604922956617757726_298427 received exception java.io.EOFException:
>> > while trying to read 49299 bytes
>> > 2009-02-20 08:50:45,342 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>> > 192.168.6.38:50010, dest: /192.168.6.38:56791, bytes: 3318, op:
>> HDFS_READ,
>> > cliID: DFSClient_1697856093, srvID:
>> > DS-697440498-192.168.6.38-50010-1233008986086, blockid:
>> > blk_-4029959142608094898_296648
>> > 2009-02-20 08:50:46,680 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> > 192.168.6.38:50010,
>> > storageID=DS-697440498-192.168.6.38-50010-1233008986086,
>> > infoPort=50075, ipcPort=50020):DataXceiver
>> > java.io.EOFException: while trying to read 32873 bytes
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
>> >        at java.lang.Thread.run(Thread.java:619)
>> > 2009-02-20 08:50:46,680 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> > 192.168.6.38:50010,
>> > storageID=DS-697440498-192.168.6.38-50010-1233008986086,
>> > infoPort=50075, ipcPort=50020):DataXceiver
>> > java.io.EOFException: while trying to read 49299 bytes
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:341)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
>> >        at
>> >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > Larry
>> >
>>
>
>
>

Reply via email to