[
https://issues.apache.org/jira/browse/HBASE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089509#comment-16089509
]
Anastasia Braginsky commented on HBASE-18375:
---------------------------------------------
bq. When there is a transfer to flat map to chunkMap we were converting weak to
strong ref right? Then why this problem happens? Sorry if am missing something.
Here is the detailed explanation of the problem, which is general and not
related to CellChunkMap directly. Let's assume the segments are implemented
with CellArrayMap and the following scenario happens:
1. Chunk C is allocated from pool and is used as part of the Segment S. S is
currently part of the compaction pipeline.
2. Due to the snapshot or the compaction of the pipeline, segment S is swapped
out of the pipeline. C is now unreachable by any reference. Because MSLAB is
holding only chunkIDs, Cells referencing to C are unreachable themselves, and
weak references from ChunkCreator map aren't considered by GC. Let us also
assume no scan was happening in parallel.
3. When S is closed, C is returned to ChunkCreator, which in turn returns C to
the pool, but in parrallel the GC is already freeing C's "unreachable"
ByteBuffer.
4. As a result the uninitialized chunk C is in the pool and is later allocated
to some other uses.
I personally hit this problem on the machine.
After the fix the problem didn't appear.
Regarding the CellChunkMap, you are right to say that there we shouldn't see
the problem, as CellChunkMap's data chunks are covered with strongMap and
shouldn't be released by GC. However, while S is closed the chunks are released
to ChunkCreator and the following code is invoked:
{code}
private void putbackChunks(Set<Integer> chunks) {
int toAdd = Math.min(chunks.size(), this.maxCount -
reclaimedChunks.size());
Iterator<Integer> iterator = chunks.iterator();
while (iterator.hasNext()) {
Integer chunkId = iterator.next();
// remove the chunks every time though they are from the pool or not
Chunk chunk = ChunkCreator.this.removeChunk(chunkId); // <-------- the
chunk is disconnected either from weak or strong map, so there is a period of
time the chunk is uncoveread by any reference and still unreachable
if (chunk != null) {
if (chunk.isFromPool() && toAdd > 0) {
reclaimedChunks.add(chunk);
}
toAdd--;
}
}
}
{code}
Therefore the solution is to cover the pool chunks by strong map - forever.
Hope it is clearer now.
> The pool chunks from ChunkCreator are deallocated while in pool because there
> is no reference to them
> -----------------------------------------------------------------------------------------------------
>
> Key: HBASE-18375
> URL: https://issues.apache.org/jira/browse/HBASE-18375
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.0-alpha-1
> Reporter: Anastasia Braginsky
> Priority: Critical
> Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18375-V01.patch
>
>
> Because MSLAB list of chunks was changed to list of chunk IDs, the chunks
> returned back to pool can be deallocated by JVM because there is no reference
> to them. The solution is to protect pool chunks from GC by the strong map of
> ChunkCreator introduced by HBASE-18010. Will prepare the patch today.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)