[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085376#comment-15085376
 ] 

Eshcar Hillel commented on HBASE-15016:
---------------------------------------

There are 4 decisions to make: 1) when to do in-memory flush 2) when to do 
in-memory compaction 3) when to flush to disk 4) which stores to flush to disk.

One feedback we got when working on HBASE-13408 was that decisions 1 and 2 
should be encapsulated and managed within the memstore. This is reasonable 
since the memstore holds all the information about the sizes and duplications 
etc.
What you are suggesting now is to add a ‘warning’ message sent by region to 
stores that would trigger an in-memory flush and/or a compaction.

Here is a scenario we need to avoid: having a compaction pipeline of size 80MB, 
and then whenever the active segment only reaches a few MBs - a warning message 
is sent, triggers in-memory flush and compaction. Then a big segment (80MB) is 
merged with a small segment (3MB) creating a big segment again (say, around 
80MB when removing duplication). If this happens over and over again, it’s a 
waste of cpu time and also generates a lot of work for the GC. [somewhat 
similar to the small files problem FlushLargeStoresPolicy tries to resolve]

Another issue, say you have several stores in a region, at least one default 
memstore (A) and one compacted memstore (B). Assume they both exceed 16MB, and 
other memstores are less than 16MB. When the region triggers a flush to disk, 
the current policy chooses to flush A and B. It is reasonable to flush A since 
there is no other way to reduce its size, however, is it reasonable to flush B? 
If it stays in memory longer it has a chance to reduce its size without 
flushing to disk.

Just mentioned these issues so you can consider them when preparing your patch. 

> StoreServices facility in Region
> --------------------------------
>
>                 Key: HBASE-15016
>                 URL: https://issues.apache.org/jira/browse/HBASE-15016
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, Regioncounters.pdf
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to