[
https://issues.apache.org/jira/browse/HBASE-26567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-26567:
-----------------------------
Description:
For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index
chunks by ChunkID ,not by strong reference, so in order to prevent the data
chunks in {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks
in {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on
the data chunk is pooled or the index type is {{IndexType.CHUNK_MAP}}. So
when we call {{ChunkCreator.getChunk}}, it is very important to specify the
IndexType. {{ChunkCreator}} has following package-visible method, which does
not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.
{code:java}
Chunk getChunk(ChunkType chunkType) {
return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
}
{code}
This method is used by following line 363 in
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is
{{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which
should allocate a new not-pooled chunk.
{code:java}
359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360 switch (chunkType) {
361 case INDEX_CHUNK:
362 case DATA_CHUNK:
363 Chunk c = this.chunkCreator.getChunk(chunkType);
364 chunks.add(c.getId());
365 return c;
366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367 default:
368 return null;
369 }
370 }
{code}
Even though fortunately this {{IndexType}} mismatch does not cause bug now
because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
chunkSize,but I think it is dangerous to not specify the {{IndexType}} if we
want to refactor or add new functionalities in the future. We would better to
specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
When thinking this problem more, I think we could remove the {{IndexType}} from
{{ChunkCreator}} completely to simplify the code, because when use
{{MemStoreLABImpl}},
the {{IndexType}} could only be {{IndexType.CHUNK_MAP}} ,just as following
line 106 in {{MemStoreLABImpl}} ctor.
{code:java}
96 public MemStoreLABImpl(Configuration conf) {
97 dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
98 maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
99 this.chunkCreator = ChunkCreator.getInstance();
100 // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on one!
101 Preconditions.checkArgument(maxAlloc <= dataChunkSize,
102 MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
103
104 // if user requested to work with MSLABs (whether on- or off-heap),
then the
105 // immutable segments are going to use CellChunkMap as their index
106 idxType = CompactingMemStore.IndexType.CHUNK_MAP;
107 }
{code}
And because the get {{chunk}} from {{ChunkCreator}} is only used by
{{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}} we could use
{{IndexType.ARRAY_MAP}}, we can remove the {{IndexType}} from {{ChunkCreator}}
completely.
was:
For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index
chunks by ChunkID ,not by strong reference, so in order to prevent the data
chunks in {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks
in {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on
the data chunk is pooled or the index type is {{IndexType.CHUNK_MAP}}. So
when we call {{ChunkCreator.getChunk}}, it is very important to specify the
IndexType. {{ChunkCreator}} has following package-visible method, which does
not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.
{code:java}
Chunk getChunk(ChunkType chunkType) {
return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
}
{code}
This method is used by following line 363 in
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is
{{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which
should allocate a new not-pooled chunk.
{code:java}
359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360 switch (chunkType) {
361 case INDEX_CHUNK:
362 case DATA_CHUNK:
363 Chunk c = this.chunkCreator.getChunk(chunkType);
364 chunks.add(c.getId());
365 return c;
366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367 default:
368 return null;
369 }
370 }
{code}
Even though fortunately this {{IndexType}} mismatch does not cause bug now
because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
chunkSize,but I think it is dangerous to not specify the {{IndexType}} if we
want to refactor or add new functionalities in the future. We would better to
specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
When thinking this problem more, I think we could remove the {{IndexType}} from
{{ChunkCreator}} completely to simplify the code, because when use
{{MemStoreLABImpl}},
the {{IndexType}} could only be {{IndexType.CHUNK_MAP}} ,just as following
line 106 in {{MemStoreLABImpl}} ctor.
{code:java}
96 public MemStoreLABImpl(Configuration conf) {
97 dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
98 maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
99 this.chunkCreator = ChunkCreator.getInstance();
100 // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on one!
101 Preconditions.checkArgument(maxAlloc <= dataChunkSize,
102 MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
103
104 // if user requested to work with MSLABs (whether on- or off-heap),
then the
105 // immutable segments are going to use CellChunkMap as their index
106 idxType = CompactingMemStore.IndexType.CHUNK_MAP;
107 }
{code}
And because the get {{chunk}} from {{ChunkCreator}} is only used by
{{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}} we could use
{{IndexType.ARRAY_MAP}}, so we can remove the {{IndexType}} from
{{ChunkCreator}} completely.
> Remove IndexType from ChunkCreator
> ----------------------------------
>
> Key: HBASE-26567
> URL: https://issues.apache.org/jira/browse/HBASE-26567
> Project: HBase
> Issue Type: Improvement
> Components: in-memory-compaction
> Affects Versions: 3.0.0-alpha-1, 2.4.8
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
>
> For {{CellChunkImmutableSegment}}, cells in data chunks is indexed in index
> chunks by ChunkID ,not by strong reference, so in order to prevent the data
> chunks in {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data
> chunks in {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or
> not the {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}}
> depends on the data chunk is pooled or the index type is
> {{IndexType.CHUNK_MAP}}. So when we call {{ChunkCreator.getChunk}}, it is
> very important to specify the IndexType. {{ChunkCreator}} has following
> package-visible method, which does not require the {{IndexType}} parameter
> and use {{IndexType.ARRAY_MAP}}.
> {code:java}
> Chunk getChunk(ChunkType chunkType) {
> return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
> }
> {code}
> This method is used by following line 363 in
> {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} which index type is
> {{IndexType.CHUNK_MAP}} to force copy a big cell into {{MemStoreLAB}}, which
> should allocate a new not-pooled chunk.
> {code:java}
> 359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
> 360 switch (chunkType) {
> 361 case INDEX_CHUNK:
> 362 case DATA_CHUNK:
> 363 Chunk c = this.chunkCreator.getChunk(chunkType);
> 364 chunks.add(c.getId());
> 365 return c;
> 366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
> 367 default:
> 368 return null;
> 369 }
> 370 }
> {code}
> Even though fortunately this {{IndexType}} mismatch does not cause bug now
> because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not invoked by
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
> chunkSize,but I think it is dangerous to not specify the {{IndexType}} if we
> want to refactor or add new functionalities in the future. We would better to
> specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
> When thinking this problem more, I think we could remove the {{IndexType}}
> from {{ChunkCreator}} completely to simplify the code, because when use
> {{MemStoreLABImpl}},
> the {{IndexType}} could only be {{IndexType.CHUNK_MAP}} ,just as following
> line 106 in {{MemStoreLABImpl}} ctor.
> {code:java}
> 96 public MemStoreLABImpl(Configuration conf) {
> 97 dataChunkSize = conf.getInt(CHUNK_SIZE_KEY, CHUNK_SIZE_DEFAULT);
> 98 maxAlloc = conf.getInt(MAX_ALLOC_KEY, MAX_ALLOC_DEFAULT);
> 99 this.chunkCreator = ChunkCreator.getInstance();
> 100 // if we don't exclude allocations >CHUNK_SIZE, we'd infiniteloop on
> one!
> 101 Preconditions.checkArgument(maxAlloc <= dataChunkSize,
> 102 MAX_ALLOC_KEY + " must be less than " + CHUNK_SIZE_KEY);
> 103
> 104 // if user requested to work with MSLABs (whether on- or off-heap),
> then the
> 105 // immutable segments are going to use CellChunkMap as their index
> 106 idxType = CompactingMemStore.IndexType.CHUNK_MAP;
> 107 }
> {code}
> And because the get {{chunk}} from {{ChunkCreator}} is only used by
> {{MemStoreLABImpl}}, and only when we not use {{MemStoreLABImpl}} we could
> use {{IndexType.ARRAY_MAP}}, we can remove the {{IndexType}} from
> {{ChunkCreator}} completely.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)