[ 
https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581499#comment-16581499
 ] 

Mike Drob commented on HBASE-21031:
-----------------------------------

bq. +        LOG.error("replay failed, that's expected", t);
I think the test is better without this since we already fail the test if the 
replay doesn't fail, so this is unneeded lines in the log. Especially stack 
trace at error will make it stick out and harder to diagnose other failures.

> Memory leak if replay edits failed during region opening
> --------------------------------------------------------
>
>                 Key: HBASE-21031
>                 URL: https://issues.apache.org/jira/browse/HBASE-21031
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21031.branch-2.0.001.patch, 
> HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, 
> HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, 
> memoryleak.png
>
>
> Due to HBASE-21029, when replaying edits with a lot of same cells, the 
> memstore won't flush,  a exception will throw when all heap space was used:
> {code}
> 2018-08-06 15:52:27,590 ERROR 
> [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] 
> handler.OpenRegionHandler(302): Failed open of 
> region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., 
> starting to roll back the global memstore size.
> java.lang.OutOfMemoryError: Java heap space
>         at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>         at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>         at 
> org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
>         at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
>         at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
>         at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
>         at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
>         at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
> {code}
> After this exception, the memstore did not roll back, and since MSLAB is 
> used, all the chunk allocated won't release for ever. Those memory is leak 
> forever...
> We need to rollback the memory if open region fails(For now, only global 
> memstore size is decreased after failure).
> Another problem is that we use replayEditsPerRegion in RegionServerAccounting 
> to record how many memory used during replaying. And decrease the global 
> memstore size if replay fails. This is not right, since during replaying, we 
> may also flush the memstore, the size in the map of replayEditsPerRegion is 
> not accurate at all! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to