[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638791#comment-14638791
 ] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

I did some learning of the flush-by-column-family feature (HBASE-10201). I 
think it will help us in supporting WAL truncation in compacting memstore. 
[~Apache9] I would appreciate if you can confirm that this should work.

In the current implementation, when a region flushes a store, the previous 
sequence id that was associated with this store in the WAL 
oldestUnflushedStoreSequenceIds set is removed. The first put operation to 
occur after the flush installs a new sequence id for the store.
The WAL uses this bookeeping when it needs to decide which WAL files can be 
archived (WAL truncation).

For compacting memstore we would like to keep the sequence id in the 
oldestUnflushedStoreSequenceIds set of the WAL even after a flush is invoked. 
Instead, the memstore compaction thread will be responsible for setting an 
approximation of the correct sequence id for the store in the set.
To this end, the compacting memstore maintains a mapping of timestamp to region 
sequence number (the same sequence numbers that are attached to WAL edits). 
Whenever a flush is invoked on a compacting memstore it adds the current time 
and current sequence number pair to this mapping. 
As an additional artifact of the memstore compaction the minimal timestamp that 
is still present in the memstore is computed. This timestamp is then used to 
identify the maximal sequence id in the timestamp->seqId mapping for which no 
entries are left in the memstore. Finally, it uses this approximated sequence 
number to update the oldestUnflushedStoreSequenceIds set.

This way the WAL is being truncated with some delay with respect to the real 
sequence number, but the memory overhead if fairly small (only a small map of 
ts->seq is added to the memstore) when compared to a solution that adds a 
sequence number to each cell in the memstore and then uses it to find the 
*exact* oldest unflushed sequence id.

What say you?

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to