[
https://issues.apache.org/jira/browse/HBASE-26567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-26567:
-----------------------------
Description:
For {{CellChunkMap}}, cells in data chunks in is indexed by index chunks by
ChunkID ,not by strong reference, so in order to prevent the data chunks in
{{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks in
{{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on
the data chunk is pooled or the index type of {{CompactingMemStore}} is
{{IndexType.CHUNK_MAP}}. So when we
call {{ChunkCreator.getChunk}}, it is very important to specify the IndexType.
{{ChunkCreator}} has following package-visible method, which does not require
the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.
{code:java}
Chunk getChunk(ChunkType chunkType) {
return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
}
{code}
This method is used by following line 363 in
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into
{{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}.
{code:java}
359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360 switch (chunkType) {
361 case INDEX_CHUNK:
362 case DATA_CHUNK:
363 Chunk c = this.chunkCreator.getChunk(chunkType);
364 chunks.add(c.getId());
365 return c;
366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367 default:
368 return null;
369 }
370 }
{code}
Even though fortunately this IndexType mismatch does not cause bug now because
{{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
chunkSize,but I think it is dangerous to use this
{{ChunkCreator.getChunk(ChunkType chunkType)}} method if we want to refactor or
add new functionalities in the future. We would better to specify the
{{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
My fix is as follows:
1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}},
{{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let
{{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to
determine the {{IndexType}} explicitly.
2. Unify the determination of which {{IndexType}} to use into
{{CompactingMemStore}}.
was:
As HBASE-18375 said, data chunks in {{CellChunkMap}} is indexed by index chunks
by ChunkID ,not by strong reference, so in order to prevent the data chunks in
{{CellChunkMap}} from gc by JVM we put these data chunks in
{{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the
{{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on
the data chunk is pooled or the index type of {{CompactingMemStore}} is
{{IndexType.CHUNK_MAP}}. So when we
call {{ChunkCreator.getChunk}}, it is very important to specify the IndexType.
{{ChunkCreator}} has following package-visible method, which does not require
the the {{IndexType}} and use {{IndexType.ARRAY_MAP}}
{code:java}
Chunk getChunk(ChunkType chunkType) {
return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
}
{code}
This method is used by following line 363 in
{{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into
{{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}.
{code:java}
359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
360 switch (chunkType) {
361 case INDEX_CHUNK:
362 case DATA_CHUNK:
363 Chunk c = this.chunkCreator.getChunk(chunkType);
364 chunks.add(c.getId());
365 return c;
366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
367 default:
368 return null;
369 }
370 }
{code}
Even though fortunately this IndexType mismatch does not cause bug now because
{{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by
{{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
chunkSize,but I think it is dangerous to use this
{{ChunkCreator.getChunk(ChunkType chunkType)}} method if we refactor or add new
functionality in the future. We would better to specify the {{IndexType}}
explicitly when we use {{ChunkCreator.getChunk}}.
My fix is as follows:
1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}},
{{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let
{{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to
determine the {{IndexType}} explicitly.
2. Unify the determination of which {{IndexType}} to use into
{{CompactingMemStore}}.
> Remove high-risk ChunkCreator.getChunk(ChunkType chunkType) method
> ------------------------------------------------------------------
>
> Key: HBASE-26567
> URL: https://issues.apache.org/jira/browse/HBASE-26567
> Project: HBase
> Issue Type: Improvement
> Components: in-memory-compaction
> Affects Versions: 3.0.0-alpha-1, 2.4.8
> Reporter: chenglei
> Priority: Major
>
> For {{CellChunkMap}}, cells in data chunks in is indexed by index chunks by
> ChunkID ,not by strong reference, so in order to prevent the data chunks in
> {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks in
> {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the
> {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on
> the data chunk is pooled or the index type of {{CompactingMemStore}} is
> {{IndexType.CHUNK_MAP}}. So when we
> call {{ChunkCreator.getChunk}}, it is very important to specify the
> IndexType. {{ChunkCreator}} has following package-visible method, which does
> not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}.
> {code:java}
> Chunk getChunk(ChunkType chunkType) {
> return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType);
> }
> {code}
> This method is used by following line 363 in
> {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into
> {{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}.
> {code:java}
> 359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) {
> 360 switch (chunkType) {
> 361 case INDEX_CHUNK:
> 362 case DATA_CHUNK:
> 363 Chunk c = this.chunkCreator.getChunk(chunkType);
> 364 chunks.add(c.getId());
> 365 return c;
> 366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size
> 367 default:
> 368 return null;
> 369 }
> 370 }
> {code}
> Even though fortunately this IndexType mismatch does not cause bug now
> because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than
> chunkSize,but I think it is dangerous to use this
> {{ChunkCreator.getChunk(ChunkType chunkType)}} method if we want to refactor
> or add new functionalities in the future. We would better to specify the
> {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}.
> My fix is as follows:
> 1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}},
> {{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let
> {{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to
> determine the {{IndexType}} explicitly.
> 2. Unify the determination of which {{IndexType}} to use into
> {{CompactingMemStore}}.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)