[ 
https://issues.apache.org/jira/browse/HBASE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089509#comment-16089509
 ] 

Anastasia Braginsky commented on HBASE-18375:
---------------------------------------------

bq. When there is a transfer to flat map to chunkMap we were converting weak to 
strong ref right? Then why this problem happens? Sorry if am missing something.

Here is the detailed explanation of the problem, which is general and not 
related to CellChunkMap directly. Let's assume the segments are implemented 
with CellArrayMap and the following scenario happens:

1. Chunk C is allocated from pool and is used as part of the Segment S. S is 
currently part of the compaction pipeline.
2. Due to the snapshot or the compaction of the pipeline, segment S is swapped 
out of the pipeline. C is now unreachable by any reference. Because MSLAB is 
holding only chunkIDs, Cells referencing to C are unreachable themselves, and 
weak references from ChunkCreator map aren't considered by GC. Let us also 
assume no scan was happening in parallel.
3. When S is closed, C is returned to ChunkCreator, which in turn returns C to 
the pool, but in parrallel the GC is already freeing C's "unreachable" 
ByteBuffer.
4. As a result the uninitialized chunk C is in the pool and is later allocated 
to some other uses.

I personally hit this problem on the machine.
After the fix the problem didn't appear.

Regarding the CellChunkMap, you are right to say that there we shouldn't see 
the problem, as CellChunkMap's data chunks are covered with strongMap and 
shouldn't be released by GC. However, while S is closed the chunks are released 
to ChunkCreator and the following code is invoked:
{code}
private void putbackChunks(Set<Integer> chunks) {
      int toAdd = Math.min(chunks.size(), this.maxCount - 
reclaimedChunks.size());
      Iterator<Integer> iterator = chunks.iterator();
      while (iterator.hasNext()) {
        Integer chunkId = iterator.next();
        // remove the chunks every time though they are from the pool or not
        Chunk chunk = ChunkCreator.this.removeChunk(chunkId);  // <-------- the 
chunk is disconnected either from weak or strong map, so there is a period of 
time the chunk is uncoveread by any reference and still unreachable
        if (chunk != null) {
          if (chunk.isFromPool() && toAdd > 0) {
            reclaimedChunks.add(chunk);
          }
          toAdd--;
        }
      }
    }
{code}

Therefore the solution is to cover the pool chunks by strong map - forever. 
Hope it is clearer now.

> The pool chunks from ChunkCreator are deallocated while in pool because there 
> is no reference to them
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18375
>                 URL: https://issues.apache.org/jira/browse/HBASE-18375
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0-alpha-1
>            Reporter: Anastasia Braginsky
>            Priority: Critical
>             Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2
>
>         Attachments: HBASE-18375-V01.patch
>
>
> Because MSLAB list of chunks was changed to list of chunk IDs, the chunks 
> returned back to pool can be deallocated by JVM because there is no reference 
> to them. The solution is to protect pool chunks from GC by the strong map of 
> ChunkCreator introduced by HBASE-18010. Will prepare the patch today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to