[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634999#comment-14634999
 ] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

Is your first concern ``how can an admin safely decide on the number of region 
per region server if the memory trace of a region may be bigger than flush 
size?''
First, this can happen also with default memstore implementation, and for this 
reason the blocking flush size is defined, and we make sure not to cross this 
upper limit even with compacted memstore implementation.
Second, while less trivial, it is still possible to come up with a reasonable 
computation if you have an upper limit on the number of regions with compacted 
memstore at any point in time.

Regarding you second question, the compaction pipeline is composed of memstore 
segments (1 or more). Each memstore segment has a cell set, currently this is 
the same data structure as in the active segment, namely a skip list. If found 
useful it is possible to change the format in which the cells are stored in the 
pipeline after compaction.

[~anoop.hbase] I hope to answer your questions.

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to