[ https://issues.apache.org/jira/browse/HBASE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089509#comment-16089509 ]
Anastasia Braginsky commented on HBASE-18375: --------------------------------------------- bq. When there is a transfer to flat map to chunkMap we were converting weak to strong ref right? Then why this problem happens? Sorry if am missing something. Here is the detailed explanation of the problem, which is general and not related to CellChunkMap directly. Let's assume the segments are implemented with CellArrayMap and the following scenario happens: 1. Chunk C is allocated from pool and is used as part of the Segment S. S is currently part of the compaction pipeline. 2. Due to the snapshot or the compaction of the pipeline, segment S is swapped out of the pipeline. C is now unreachable by any reference. Because MSLAB is holding only chunkIDs, Cells referencing to C are unreachable themselves, and weak references from ChunkCreator map aren't considered by GC. Let us also assume no scan was happening in parallel. 3. When S is closed, C is returned to ChunkCreator, which in turn returns C to the pool, but in parrallel the GC is already freeing C's "unreachable" ByteBuffer. 4. As a result the uninitialized chunk C is in the pool and is later allocated to some other uses. I personally hit this problem on the machine. After the fix the problem didn't appear. Regarding the CellChunkMap, you are right to say that there we shouldn't see the problem, as CellChunkMap's data chunks are covered with strongMap and shouldn't be released by GC. However, while S is closed the chunks are released to ChunkCreator and the following code is invoked: {code} private void putbackChunks(Set<Integer> chunks) { int toAdd = Math.min(chunks.size(), this.maxCount - reclaimedChunks.size()); Iterator<Integer> iterator = chunks.iterator(); while (iterator.hasNext()) { Integer chunkId = iterator.next(); // remove the chunks every time though they are from the pool or not Chunk chunk = ChunkCreator.this.removeChunk(chunkId); // <-------- the chunk is disconnected either from weak or strong map, so there is a period of time the chunk is uncoveread by any reference and still unreachable if (chunk != null) { if (chunk.isFromPool() && toAdd > 0) { reclaimedChunks.add(chunk); } toAdd--; } } } {code} Therefore the solution is to cover the pool chunks by strong map - forever. Hope it is clearer now. > The pool chunks from ChunkCreator are deallocated while in pool because there > is no reference to them > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-18375 > URL: https://issues.apache.org/jira/browse/HBASE-18375 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0-alpha-1 > Reporter: Anastasia Braginsky > Priority: Critical > Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18375-V01.patch > > > Because MSLAB list of chunks was changed to list of chunk IDs, the chunks > returned back to pool can be deallocated by JVM because there is no reference > to them. The solution is to protect pool chunks from GC by the strong map of > ChunkCreator introduced by HBASE-18010. Will prepare the patch today. -- This message was sent by Atlassian JIRA (v6.4.14#64029)