[
https://issues.apache.org/jira/browse/HBASE-16193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yu Li resolved HBASE-16193.
---------------------------
Resolution: Fixed
Closing this umbrella since all sub issues resolved and fix pushed into all
0.98+ branches.
> Memory leak when putting plenty of duplicated cells
> ---------------------------------------------------
>
> Key: HBASE-16193
> URL: https://issues.apache.org/jira/browse/HBASE-16193
> Project: HBase
> Issue Type: Bug
> Reporter: Yu Li
> Assignee: Yu Li
> Attachments: MemoryLeakInMemStore.png, MemoryLeakInMemStore_2.png
>
>
> Recently we suffered from a weird problem that RS heap size could not reduce
> much even after FullGC, and it kept FullGC and could hardly serve any
> request. After debugging for days, we found the root cause: we won't count in
> the allocated memory in MSLAB chunk when adding duplicated cells (including
> put and delete). We have below codes in {{AbstractMemStore#add}} (or
> {{DefaultMemStore#add}} for branch-1):
> {code}
> public long add(Cell cell) {
> Cell toAdd = maybeCloneWithAllocator(cell);
> return internalAdd(toAdd);
> }
> {code}
> where we will allocate memory in MSLAB (if using) chunk for the cell first,
> and then call {{internalAdd}}, where we could see below codes in
> {{Segment#internalAdd}} (or {{DefaultMemStore#internalAdd}} for branch-1):
> {code}
> protected long internalAdd(Cell cell) {
> boolean succ = getCellSet().add(cell);
> long s = AbstractMemStore.heapSizeChange(cell, succ);
> updateMetaInfo(cell, s);
> return s;
> }
> {code}
> So if we are writing a duplicated cell, we assume there's no heap size
> change, while actually the chunk size is taken (referenced).
> Let's assume this scenario, that there're huge amount of writing on the same
> cell (same key, different values), which is not that special in
> MachineLearning use case, and there're also few normal writes, and after some
> long time, it's possible that we have many chunks with kvs like: {{cellA,
> cellB, cellA, cellA, .... cellA}}, that we only counts 2 cells for each
> chunk, but actually the chunk is full. So the devil comes, that we think it's
> still not hitting flush size, while there's already GBs heapsize taken.
> There's also a more extreme case, that we only writes a single cell over and
> over again and fills one chunk quickly. Ideally the chunk should be cleared
> by GC, but unfortunately we have kept a redundant reference in
> {{HeapMemStore#chunkQueue}}, which is useless when we're not using chunkPool
> by default.
> This is the umbrella to describe the problem, and I'll open two sub-JIRAs to
> resolve the above two issues separately.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)