[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031369#comment-15031369
 ] 

stack commented on HBASE-13408:
-------------------------------

Thanks [~ebortnik]

On 1., do you have a particular use that you are targeting? A particular 
app/user?

On...

bq. The admin should know what he is doing.

I think for many hbase deploys, the admin knows what they are doing but finding 
all the features and configs that are available is work that many admins do not 
have the time to figure and even when found, unless there is lots of hearsay 
that enabling the config helps their case, that others have had good success 
enabling the feature, they are often reluctant to enable the feature themselves 
(usually because they are overworked supporting hbase and many other 
subsubsystems more than for any other reason...or, there is lots of stuff in 
hbase and the hadoop space that has been added but is not well tested...)  
Hence my argument for having the feature always on. If they are doing lots of 
overwrite, they get an improvement. If not, could we have it so it does not 
cost having it enabled? (Tall order I know). If always on, it will get exercise 
and we'll find the bugs. BTW, it is ok if it eats into cache especially if the 
StoreSegment soon converts to be a data structure that is fast lookup.

Otherwise, 1. sounds great.

2. sounds great too. Will be easier on all concerned getting the work 
contributed. Could we replace Snapshot with StoreSegment?  Yeah, someone of us 
maybe could have a go at step 3.

Good by you too [~anoop.hbase]?

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.    The data is kept in memory for as long as possible
> 2.    Memstore data is either compacted or in process of being compacted 
> 3.    Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to