[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631385#comment-14631385
 ] 

stack commented on HBASE-13408:
-------------------------------

I left some comments on rb but should have read the design first.

I read the design and had these few questions (design is nicely written):

FYI, Flush-by-column-family is on by default in 1.2 HBase. Will you need to do 
anything to accommodate this?

You say "In high­churn workloads, compacting the memstore can help maintain the 
data in memory, and thereby speed up data retrieval" The pipeline entries are 
still skiplists sets? What is the compacted representation? Is it still a 
skiplist? Skip list is slow especially as we get large. Was wondering if you 
saw any speedup if only because you have many small skip lists rather than one 
big one?

"Therefore, we suggest applying this optimization only to in­memory column 
families." In testing you find that the overhead slows us down?

I asked in rb what threading model was? Is there a new thread per Store 
memstore? 

Is the new force­flush­size a new config? I wasn't following why we need it? If 
size of current Set + pipeline is above max size, flush? I wasn't clear on why 
the need of 2.5.

Is memstoresegment our old snapshot? it has other facility beyond old snapshot?

Thanks


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to