[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253829#comment-15253829
 ] 

Anoop Sam John commented on HBASE-14920:
----------------------------------------

bq.public final static double IN_MEMORY_FLUSH_THRESHOLD_FACTOR = 0.9;

So we check after every cell addition to the active segment, whether it is 
worth for an in memory flush now.  The size calc for that , why we consider 
FlushLargeStoresPolicy.DEFAULT_HREGION_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND_MIN 
and then multiply that with this factor of 90%?
FlushLargeStoresPolicy#configureForRegion  sets a bound for each memstore by
{code}
protected void configureForRegion(HRegion region) {
    super.configureForRegion(region);
    int familyNumber = region.getTableDesc().getFamilies().size();
    if (familyNumber <= 1) {
      // No need to parse and set flush size lower bound if only one family
      // Family number might also be zero in some of our unit test case
      return;
    }
    // For multiple families, lower bound is the "average flush size" by default
    // unless setting in configuration is larger.
    long flushSizeLowerBound = region.getMemstoreFlushSize() / familyNumber;
    long minimumLowerBound =
        getConf().getLong(HREGION_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND_MIN,
          DEFAULT_HREGION_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND_MIN);
    if (minimumLowerBound > flushSizeLowerBound) {
      flushSizeLowerBound = minimumLowerBound;
    }
{code}

Can we simplify our calc like we get avg size for each memstore size when a 
normal flush (ie.  Memstore size , def 128 MB  / #stores)  and multiply that 
with a factor for deciding the in memory flush.   Table have 2 stores. So avg 
max size for each memstore is 64 MB.  And we keep a factor of say 25% . So when 
memstore size reaches 16 MB, we do an in memory flush.

Another concern is when a flush request comes (It can be because of global 
memstore size above high or lower watermark or because of region memstore size 
reaches limit, def 128MB  or because of an explicit flush call from user via 
API ), why we flush to disk only some part?  Only the tail of pipeline.   IMHO, 
when a to disk flush request comes, we must flush whole memstore.
In case of flush because of lower/higher water mark crossed, we pick up regions 
for flush n increasing order of region memstore size.  This size includes all 
segment's size.  And we may end up in flushing much lesser size!

Another thing on general is we account the memstore size in many places now..  
RS level, Region level as state vars.  And within the memstore it has a size.  
Now with all the in memory flush, the size changes after an in memory flush. I 
see we have a call via RegionServicesForStores.  But all these make us more 
error prone?  Do we need some sort of cleanup in this size accounting area?  cc 
[[email protected]]


> Compacting Memstore
> -------------------
>
>                 Key: HBASE-14920
>                 URL: https://issues.apache.org/jira/browse/HBASE-14920
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to