[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024910#comment-15024910
 ] 

stack commented on HBASE-13408:
-------------------------------

Thanks for the design doc update.

What do you lot think of the new 'principals' (am asking the authors).

We go from "The data is kept in memory for as long as possible" to instead, 
"...[u]se the in­memory space effectively, by periodically compacting the 
memstore content."

We also talk of 'compaction'. What do we mean by compaction? The removal of 
data that has been overwritten? Or making the data take up smaller space in 
RAM?  The latter is a fine objective but what are you thinking? At first there 
will be no compaction. We'll just introduce the segments and pipeline. Later 
we'll want to add in 'compaction'. We'll expend CPU to change skip list to a 
more compact format. What should it be. We posit hfile or the blocks that will 
go into hfiles. Does that makes sense as an in memory data structure? If it 
does, good. If not, what should the in memory compacted format be? Have you 
done any exploration here?

Do we have a sense of how much advantage there is to be had 'compacting' 
segments in the pipeline?

How do we ensure this feature is of benefit 90% of the time and not for some 
exotic use case where most of the data is being overwritten and the column 
families are 'in memory'. Even then, do we have measure to see the improvement 
to be had?

Let me look at the patch (smile). In fact, the above questions come of my 
trying to look at the patch. Thanks.



> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to