Allan Yang created HBASE-21031:
----------------------------------
Summary: Memory leak if replay edits failed during region opening
Key: HBASE-21031
URL: https://issues.apache.org/jira/browse/HBASE-21031
Project: HBase
Issue Type: Bug
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang
Due to HBASE-21029, when replaying edits with a lot of same cells, the memstore
won't flush, a exception will throw when all heap space was used:
{code}
2018-08-06 15:52:27,590 ERROR
[RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2]
handler.OpenRegionHandler(302): Failed open of
region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41.,
starting to roll back the global memstore size.
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at
org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
at
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
at
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
at
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
at
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
at
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
at
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
at
org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
at
org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
at
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
at
org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
{code}
After this exception, the memstore did not roll back, and since MSLAB is used,
all the chunk allocated won't release for ever. Those memory is leak forever...
We need to rollback the memory if open region fails(For now, only global
memstore size is decreased after failure).
Another problem is that we use replayEditsPerRegion in RegionServerAccounting
to record how many memory used during replaying. And decrease the global
memstore size if replay fails. This is not right, since during replaying, we
may also flush the memstore, the size in the map of replayEditsPerRegion is not
accurate at all!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)