stack wrote:
Thanks for writing this note. I'd seen the IOE and presumed it a known issue in hadoop 0.17. I then went looking for it but couldn't find one. Turns out its a silly issue in hbase (HBASE-761).

How many regions you have loaded up in your regionserver?

You are using default heap size? You might try upping it a little. Also make sure you have upped open file descriptors limit too (See #4 and #6 in the FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ).


thanks St.Ack,

we've reran our tests, uncommented HEAP_SIZE=1000 in our hbase-env, and have updated trunk to the latest. we haven't run into OOME so far, and no sight of all those compaction errors anymore (thanks!).

we have 5 regions loaded up in our regionserver

our test hammers hbase as we import a bunch of files into our hbase tables. now, what we see is that hbase seems to be blocking such that nothing gets imported. everything is up and running (hdfs namenode/datanodes, hbase master/regionserver)

below, are the logs we see from our hbase region server and from the hdfs 
datanode

----------------------------------
hbase regionserver
---------------------------------
2008-07-22 16:37:10,419 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 396381, skipped 1, 315153 2008-07-22 16:37:10,672 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 258344, skipped 1, 204628 2008-07-22 16:37:10,856 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 342049, skipped 1, 271105 2008-07-22 16:37:11,080 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 237573, skipped 1, 188148 2008-07-22 16:37:11,280 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 78030028, skipped 1, 39326312 2008-07-22 16:37:33,464 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 366849, skipped 1, 291596 2008-07-22 16:37:33,928 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 984138, skipped 2, 897309 2008-07-22 16:37:34,201 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 67294, skipped 1, 51834 2008-07-22 16:37:34,643 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 45569, skipped 2, 41020 2008-07-22 16:37:35,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region dmls,,1216768730386 in 25sec 2008-07-22 16:38:00,151 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region dmls,,1216768730386 2008-07-22 16:38:00,157 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 413508, skipped 2, 396083 2008-07-22 16:38:00,502 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 269288, skipped 2, 258024 2008-07-22 16:38:00,789 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 356584, skipped 2, 341743 2008-07-22 16:38:01,032 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 247637, skipped 2, 237273 2008-07-22 16:38:01,337 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 83129653, skipped 0, 0 2008-07-22 16:38:42,495 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 382791, skipped 2, 366533 2008-07-22 16:38:42,724 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 1024454, skipped 2, 897309 2008-07-22 16:38:42,959 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 70464, skipped 2, 66950 2008-07-22 16:38:43,377 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 47891, skipped 2, 41020 2008-07-22 16:38:43,716 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region dmls,,1216768730386 in 43sec 2008-07-22 16:38:53,954 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:40:03,931 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:41:13,943 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:42:23,955 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 0 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:43:33,968 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:44:34,095 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 5 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:45:44,109 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 3 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:46:54,122 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:48:04,135 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:48:18,116 INFO org.apache.hadoop.hbase.regionserver.LogRoller: Rolling hlog. Number of entries: 26294 2008-07-22 16:48:18,133 INFO org.apache.hadoop.hbase.regionserver.HLog: New log writer created at /hbase/log_64.62.244.2_1216768695534_60020/hlog.dat.1216770498127 2008-07-22 16:49:14,165 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size


---------------------------------------
hdfs datanode
--------------------------------------

2008-07-22 17:10:41,828 WARN org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010:Got exception while serving blk_-355911506373371046 to /64.62.244.2:
java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcher.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
   at sun.nio.ch.IOUtil.write(IOUtil.java:75)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
   at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
   at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
   at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
   at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
   at java.lang.Thread.run(Thread.java:619)

2008-07-22 17:10:41,828 ERROR org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010:DataXceiver: java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcher.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
   at sun.nio.ch.IOUtil.write(IOUtil.java:75)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
   at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
   at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
   at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
   at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
   at java.lang.Thread.run(Thread.java:619)

2008-07-22 17:10:41,834 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_8154627018748859939 to /64.62.244.2 2008-07-22 17:10:41,843 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_6062812905615890844 to /64.62.244.2 2008-07-22 17:10:41,987 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_-3883168343557890264 to /64.62.244.2



St.Ack

leith wrote:
looking at our region logs, we've noticed that the compaction thread constantly runs into exceptions. the entire log is filled with something like this:

----------------------------------
2008-07-22 12:29:52,759 WARN org.apache.hadoop.hbase.regionserver.HStore: Exception closing reader for 242866774/new
java.io.IOException: Stream closed
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.close(DFSClient.java:1319)
   at java.io.FilterInputStream.close(FilterInputStream.java:155)
at org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1581)
   at org.apache.hadoop.io.MapFile$Reader.close(MapFile.java:577)
at org.apache.hadoop.hbase.regionserver.HStore.closeCompactionReaders(HStore.java:917) at org.apache.hadoop.hbase.regionserver.HStore.compactHStoreFiles(HStore.java:910) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:787) at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:887) at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:847) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:84)
-------------------------------------

the regionserver is taking about up a good amount of memory on our system, and nothing is happening except for i assume compaction/split processes.

these only seem to be warnings, but there is so many of them, it would be nice to get a second opinion on this.

we've also gotten an 'out of memory' exception a few times from the compaction thread, and those actually ended up killing the thread, resulting in the region server shutting itself down.

thanks,

/leith



Reply via email to