Hi Vladimir, Do you have MSLAB enabled? My guess is with 1000 regions you're seeing a lot of memory usage from MSLAB. Can you try the patch from HBASE-3680 to see what the "wasted memory" from MSLABs are?
-Todd On Sun, Oct 30, 2011 at 4:54 PM, Vladimir Rodionov <[email protected]> wrote: > > We have observing frequent OOME during test load on a small cluster (15 > nodes), but the number of regions is > quite high (~1K per region server) > > It seems that we are hitting constantly HBASE-4107 bug. > > 2011-10-29 07:23:19,963 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > 2011-10-29 07:23:23,171 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > LRU Stats: total=30.07 MB, free=764.53 MB, max=794.6 MB, blocks=418, > accesses=198528211, hits=196714784, hitRatio=99.08%%, > cachingAccesses=196715094, cachingHits=196714676, cachingHitsRatio=99.99%%, > evictions=0, evicted=0, evictedPerRun=NaN > 2011-10-29 07:23:55,858 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > 2011-10-29 07:26:43,776 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: > Could not append. Requesting close of hlog > java.io.IOException: Reflection > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147) > at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:979) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145) > ... 2 more > Caused by: java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2204) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3086) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3169) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) > ... 6 more > 2011-10-29 07:26:43,809 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > > The most interesting part is heap dump analysis: > > Heap Size = 4G > > byte[] consume 86% of heap > 68% of overall Heap is accessible from MemStore instances > MemStore-> KeyValueSkipListSet -> ConcurrentSkip > > I am not saying that this is a memory leak but taking into account that > MemStore default size is 40% of 4G = 1.6G > 68% looks very suspicious. > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Ted Yu [[email protected]] > Sent: Saturday, October 29, 2011 3:56 PM > To: [email protected] > Subject: test failure due to missing baseznode > > If you happen to see test failure similar to the following: > https://builds.apache.org/job/PreCommit-HBASE-Build/99//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFilesSplitRecovery/testBulkLoadPhaseRecovery/ > > Please go over https://issues.apache.org/jira/browse/HBASE-4253 > > and apply similar fix as the following: > https://issues.apache.org/jira/secure/attachment/12491621/HBASE-4253.patch > > Cheers > > Confidentiality Notice: The information contained in this message, including > any attachments hereto, may be confidential and is intended to be read only > by the individual or entity to whom this message is addressed. If the reader > of this message is not the intended recipient or an agent or designee of the > intended recipient, please note that any review, use, disclosure or > distribution of this message or its attachments, in any form, is strictly > prohibited. If you have received this message in error, please immediately > notify the sender and/or [email protected] and delete or destroy > any copy of this message and its attachments. > -- Todd Lipcon Software Engineer, Cloudera
