[jira] [Updated] (HBASE-26567) Remove IndexType from ChunkCreator

chenglei (Jira) Wed, 15 Dec 2021 01:23:04 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-26567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated HBASE-26567:
-----------------------------
    Description: 
For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index 
chunks by ChunkID ,not by strong reference, so in order to prevent the data 
chunks in  {{CellChunkMap}} from GC by JVM,  HBASE-18375 put these data chunks 
in {{ChunkCreator.chunkIdMap}}  which is a the strong map. Whether or not the 
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on 
the data chunk is pooled or the  index type  is {{IndexType.CHUNK_MAP}}. So 
when we call  {{ChunkCreator.getChunk}}, it is very important to specify the 
IndexType. {{ChunkCreator}} has following package-visible method, which does 
not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.

{code:java}
  Chunk getChunk(ChunkType chunkType) {
    return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
  }
{code}

This method is used by following line 363 in 
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by 
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is 
{{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which 
should allocate a new not-pooled chunk.
{code:java}
359  public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360    switch (chunkType) {
361      case INDEX_CHUNK:
362      case DATA_CHUNK:
363        Chunk c = this.chunkCreator.getChunk(chunkType);
364        chunks.add(c.getId());
365        return c;
366      case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367      default:
368        return null;
369    }
370  }
{code}

Even though fortunately this {{IndexType}} mismatch does not cause bug now 
because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by 
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than 
chunkSize，but I think it is dangerous to not specify the {{IndexType}} if we 
want to refactor or add new functionalities in the future. We would better to 
specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.

When thinking this problem more, I think we could remove the {{IndexType}} from 
{{ChunkCreator}} completely to simplify the code, because when use 
{{MemStoreLABImpl}},
the {{IndexType}} could only be  {{IndexType.CHUNK_MAP}} ,just as following 
line 106 in {{MemStoreLABImpl}} ctor.
{code:java}
96  public MemStoreLABImpl(Configuration conf) {
97    dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
98    maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
99    this.chunkCreator = ChunkCreator.getInstance();
100    // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on one!
101    Preconditions.checkArgument(maxAlloc <= dataChunkSize,
102        MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
103
104     // if user requested to work with MSLABs (whether on- or off-heap), 
then the
105    // immutable segments are going to use CellChunkMap as their index
106    idxType = CompactingMemStore.IndexType.CHUNK_MAP;
107  }
{code}

And because the get {{chunk}} from {{ChunkCreator}} is only used by 
{{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}}  we could use 
{{IndexType.ARRAY_MAP}}, we can remove the {{IndexType}} from {{ChunkCreator}} 
completely.


  was:
For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index 
chunks by ChunkID ,not by strong reference, so in order to prevent the data 
chunks in  {{CellChunkMap}} from GC by JVM,  HBASE-18375 put these data chunks 
in {{ChunkCreator.chunkIdMap}}  which is a the strong map. Whether or not the 
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on 
the data chunk is pooled or the  index type  is {{IndexType.CHUNK_MAP}}. So 
when we call  {{ChunkCreator.getChunk}}, it is very important to specify the 
IndexType. {{ChunkCreator}} has following package-visible method, which does 
not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.

{code:java}
  Chunk getChunk(ChunkType chunkType) {
    return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
  }
{code}

This method is used by following line 363 in 
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by 
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is 
{{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which 
should allocate a new not-pooled chunk.
{code:java}
359  public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360    switch (chunkType) {
361      case INDEX_CHUNK:
362      case DATA_CHUNK:
363        Chunk c = this.chunkCreator.getChunk(chunkType);
364        chunks.add(c.getId());
365        return c;
366      case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367      default:
368        return null;
369    }
370  }
{code}

Even though fortunately this {{IndexType}} mismatch does not cause bug now 
because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by 
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than 
chunkSize，but I think it is dangerous to not specify the {{IndexType}} if we 
want to refactor or add new functionalities in the future. We would better to 
specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.

When thinking this problem more, I think we could remove the {{IndexType}} from 
{{ChunkCreator}} completely to simplify the code, because when use 
{{MemStoreLABImpl}},
the {{IndexType}} could only be  {{IndexType.CHUNK_MAP}} ,just as following 
line 106 in {{MemStoreLABImpl}} ctor.
{code:java}
96  public MemStoreLABImpl(Configuration conf) {
97    dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
98    maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
99    this.chunkCreator = ChunkCreator.getInstance();
100    // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on one!
101    Preconditions.checkArgument(maxAlloc <= dataChunkSize,
102        MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
103
104     // if user requested to work with MSLABs (whether on- or off-heap), 
then the
105    // immutable segments are going to use CellChunkMap as their index
106    idxType = CompactingMemStore.IndexType.CHUNK_MAP;
107  }
{code}

And because the get {{chunk}} from {{ChunkCreator}} is only used by 
{{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}}  we could use 
{{IndexType.ARRAY_MAP}}, so we can remove the {{IndexType}} from 
{{ChunkCreator}} completely.



> Remove IndexType from ChunkCreator
> ----------------------------------
>
>                 Key: HBASE-26567
>                 URL: https://issues.apache.org/jira/browse/HBASE-26567
>             Project: HBase
>          Issue Type: Improvement
>          Components: in-memory-compaction
>    Affects Versions: 3.0.0-alpha-1, 2.4.8
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>
> For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index 
> chunks by ChunkID ,not by strong reference, so in order to prevent the data 
> chunks in  {{CellChunkMap}} from GC by JVM,  HBASE-18375 put these data 
> chunks in {{ChunkCreator.chunkIdMap}}  which is a the strong map. Whether or 
> not the {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} 
> depends on the data chunk is pooled or the  index type  is 
> {{IndexType.CHUNK_MAP}}. So when we call  {{ChunkCreator.getChunk}}, it is 
> very important to specify the IndexType. {{ChunkCreator}} has following 
> package-visible method, which does not require the {{IndexType}} parameter 
> and use {{IndexType.ARRAY_MAP}}.
> {code:java}
>   Chunk getChunk(ChunkType chunkType) {
>     return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
>   }
> {code}
> This method is used by following line 363 in 
> {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by 
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is 
> {{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which 
> should allocate a new not-pooled chunk.
> {code:java}
> 359  public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
> 360    switch (chunkType) {
> 361      case INDEX_CHUNK:
> 362      case DATA_CHUNK:
> 363        Chunk c = this.chunkCreator.getChunk(chunkType);
> 364        chunks.add(c.getId());
> 365        return c;
> 366      case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
> 367      default:
> 368        return null;
> 369    }
> 370  }
> {code}
> Even though fortunately this {{IndexType}} mismatch does not cause bug now 
> because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by 
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than 
> chunkSize，but I think it is dangerous to not specify the {{IndexType}} if we 
> want to refactor or add new functionalities in the future. We would better to 
> specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
> When thinking this problem more, I think we could remove the {{IndexType}} 
> from {{ChunkCreator}} completely to simplify the code, because when use 
> {{MemStoreLABImpl}},
> the {{IndexType}} could only be  {{IndexType.CHUNK_MAP}} ,just as following 
> line 106 in {{MemStoreLABImpl}} ctor.
> {code:java}
> 96  public MemStoreLABImpl(Configuration conf) {
> 97    dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
> 98    maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
> 99    this.chunkCreator = ChunkCreator.getInstance();
> 100    // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on 
> one!
> 101    Preconditions.checkArgument(maxAlloc <= dataChunkSize,
> 102        MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
> 103
> 104     // if user requested to work with MSLABs (whether on- or off-heap), 
> then the
> 105    // immutable segments are going to use CellChunkMap as their index
> 106    idxType = CompactingMemStore.IndexType.CHUNK_MAP;
> 107  }
> {code}
> And because the get {{chunk}} from {{ChunkCreator}} is only used by 
> {{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}}  we could 
> use {{IndexType.ARRAY_MAP}}, we can remove the {{IndexType}} from 
> {{ChunkCreator}} completely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HBASE-26567) Remove IndexType from ChunkCreator

Reply via email to