Re: HBASE-4107 and OOME

Todd Lipcon Sun, 30 Oct 2011 18:02:31 -0700

On Sun, Oct 30, 2011 at 5:53 PM, Vladimir Rodionov
<[email protected]> wrote:
> Yes, mslab is enabled. It allocates 2M per region, by default? It can explain 
> 2.4G of heap usage (1277 regions)
> We fixed OOME by increasing Heap to 8G. Now I know that we can decrease slab 
> size and get back to 4G.


Yep. You can decrease slab size or disable MSLAB entirely. Or consider
having fewer, larger regions per server.

-Todd
>
> ________________________________________
> From: Todd Lipcon [[email protected]]
> Sent: Sunday, October 30, 2011 5:12 PM
> To: [email protected]
> Subject: Re: HBASE-4107 and OOME
>
> Hi Vladimir,
>
> Do you have MSLAB enabled? My guess is with 1000 regions you're seeing
> a lot of memory usage from MSLAB. Can you try the patch from
> HBASE-3680 to see what the "wasted memory" from MSLABs are?
>
> -Todd
>
> On Sun, Oct 30, 2011 at 4:54 PM, Vladimir Rodionov
> <[email protected]> wrote:
>>
>> We have observing frequent OOME during test load on a small cluster (15 
>> nodes), but the number of regions is
>> quite high (~1K per region server)
>>
>> It seems that we are hitting constantly HBASE-4107 bug.
>>
>> 2011-10-29 07:23:19,963 INFO org.apache.hadoop.io.compress.CodecPool: Got 
>> brand-new compressor
>> 2011-10-29 07:23:23,171 DEBUG 
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=30.07 MB, 
>> free=764.53 MB, max=794.6 MB, blocks=418, accesses=198528211, 
>> hits=196714784, hitRatio=99.08%%, cachingAccesses=196715094, 
>> cachingHits=196714676, cachingHitsRatio=99.99%%, evictions=0, evicted=0, 
>> evictedPerRun=NaN
>> 2011-10-29 07:23:55,858 INFO org.apache.hadoop.io.compress.CodecPool: Got 
>> brand-new compressor
>> 2011-10-29 07:26:43,776 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
>> Could not append. Requesting close of hlog
>> java.io.IOException: Reflection
>>        at 
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1002)
>>        at 
>> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:979)
>> Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at 
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
>>        ... 2 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2204)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3086)
>>        at 
>> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>>        at 
>> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3169)
>>        at 
>> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>>        at 
>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
>>        ... 6 more
>> 2011-10-29 07:26:43,809 INFO org.apache.hadoop.io.compress.CodecPool: Got 
>> brand-new compressor
>>
>> The most interesting part is heap dump analysis:
>>
>> Heap Size = 4G
>>
>> byte[] consume 86% of heap
>> 68% of overall Heap is accessible from MemStore instances
>> MemStore-> KeyValueSkipListSet -> ConcurrentSkip
>>
>> I am not saying that this is a memory leak but taking into account that 
>> MemStore default size is 40% of 4G = 1.6G
>> 68% looks very suspicious.
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [email protected]
>>
>> ________________________________________
>> From: Ted Yu [[email protected]]
>> Sent: Saturday, October 29, 2011 3:56 PM
>> To: [email protected]
>> Subject: test failure due to missing baseznode
>>
>> If you happen to see test failure similar to the following:
>> https://builds.apache.org/job/PreCommit-HBASE-Build/99//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFilesSplitRecovery/testBulkLoadPhaseRecovery/
>>
>> Please go over https://issues.apache.org/jira/browse/HBASE-4253
>>
>> and apply similar fix as the following:
>> https://issues.apache.org/jira/secure/attachment/12491621/HBASE-4253.patch
>>
>> Cheers
>>
>> Confidentiality Notice:  The information contained in this message, 
>> including any attachments hereto, may be confidential and is intended to be 
>> read only by the individual or entity to whom this message is addressed. If 
>> the reader of this message is not the intended recipient or an agent or 
>> designee of the intended recipient, please note that any review, use, 
>> disclosure or distribution of this message or its attachments, in any form, 
>> is strictly prohibited.  If you have received this message in error, please 
>> immediately notify the sender and/or [email protected] and delete 
>> or destroy any copy of this message and its attachments.
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
> Confidentiality Notice:  The information contained in this message, including 
> any attachments hereto, may be confidential and is intended to be read only 
> by the individual or entity to whom this message is addressed. If the reader 
> of this message is not the intended recipient or an agent or designee of the 
> intended recipient, please note that any review, use, disclosure or 
> distribution of this message or its attachments, in any form, is strictly 
> prohibited.  If you have received this message in error, please immediately 
> notify the sender and/or [email protected] and delete or destroy 
> any copy of this message and its attachments.
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HBASE-4107 and OOME

Reply via email to