[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422670#comment-15422670
 ] 

Anastasia Braginsky commented on HBASE-16421:
---------------------------------------------

The summary of the previous steps:

In HBASE-14920 the new variation of a MemStore (called CompactingMemStore) was 
introduced. In addition, HBASE-14920 presents partitioning of the in-memory 
content into segments that can be mutable and immutable. Periodically, the 
CompactingMemStore flushes the content of the mutable active segment into 
immutable segment. Immutable segments are kept in memory in compacting 
pipeline, where they are compacted (i.e. merged together with elimination of 
the duplicated cells).

In HBASE-14921 the new concept of flattening segments in the pipeline was 
introduced. Flat implementation of the immutable segment's index (denoted 
CellArrayMap) comes as alternative to ConcurrentSkipListMap. CellArrayMap is 
implemented as an ordered array, on top of which binary search is used to find 
the cell. CellArrayMap significantly reduces the memory foot print of the 
segment's index (compared to ConcurrentSkipListMap). Starting HBASE-14921, the 
immutable segments in the compaction pipeline can either be compacted or 
flatten (i.e. transform the index from ConcurrentSkipListMap to CellArrayMap 
without compaction).

This JIRA should hold all the changes required to present yet another variant 
for the immutable segment's index (denoted CellChunkMap) mostly suitable for 
off-heaping. CellChunkMap is a byte array, where each cell reference is 
represented with up to 12 bytes. Also binary search is used to search through 
CellChunkMap. Each cell is represented with (1) chunk id - the reference to the 
chunk of memory with the data of the cell; (2) offset - from the start of the 
chunk; (3) length - of the cell's data. The CellChunkMap uses even less bytes 
per cell (compared to CellArrayMap) and is also the only one suitable for the 
off-heaping, due to naturally being serialized. The CellChunkMap can serve as 
an index only to the cells allocated on chunks (from MemStoreLAB). 

For now we see the following candidates for the sub-JIRAs:
-- The CellChunkMap implementation itself (already prototyped but not 
integrated yet)
-- Related design issues (some refactoring of MemStoreChunkPool, MSLAB and 
HeapMSLAB)
-- Flattening to CellChunkMap (integrating with new Anoop Sam John and 
ramkrishna.s.vasudevan code)
-- The Big Cells issue (cells that are bigger then the chunk size)

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-16421
>                 URL: https://issues.apache.org/jira/browse/HBASE-16421
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Anastasia Braginsky
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to