[jira] [Commented] (HBASE-14920) Compacting Memstore

Eshcar Hillel (JIRA) Mon, 09 May 2016 01:00:46 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276076#comment-15276076
 ]


Eshcar Hillel commented on HBASE-14920:
---------------------------------------

There are two distinct issues here.

bq. dont you think this part only flush will make more number of small sized 
files and so more #compactions?

On the contrary, the evaluation experiments we ran show that with compacting 
memstore flush files are bigger and created less frequently thus incur less 
compaction. Let me explain.
With the default memstore consider the case where the memory is filled and all 
the data (128MB) is flushed to disk. If there is some duplication in the data 
then it is removed by a compaction while flushing the data to a file; this 
reduces the size of data by the factor of the duplication. In addition, the 
data is compressed on disk and thus takes even less space. For example, in our 
experiments the size of the files after being flushed were 76MB, 60MB, and 24MB 
for uniform, zipfian and hotspot keys distribution, respectively. That is, the 
more the data is skewed the smaller the files are. 
On the other hand, with compacting memstore -- the segment at the tail of the 
pipeline having been compacted at least once -- not only the files are bigger 
they are flushed to disk much later. When the optimizations of HBASE-14921 are 
in, memory utilization will be even better. 

bq. So how will it behave when system call region flush before a graceful stop 
(region close)? This will happen before split, after a replay etc? Then also we 
wont do the full flush?

The way HRegion currently handles graceful stop is by issuing a loop of flush 
requests, in between checking whether the size of the memstore have become zero.
{code}
// Don't flush the cache if we are aborting
if (!abort && canFlush) {
 int flushCount = 0;
 while (this.memstoreSize.get() > 0) {
   try {
     if (flushCount++ > 0) {
       int actualFlushes = flushCount - 1;
       if (actualFlushes > 5) {
         // If we tried 5 times and are unable to clear memory, abort
         // so we do not lose data
         throw new DroppedSnapshotException("Failed clearing memory after " +
           actualFlushes + " attempts on region: " +
             Bytes.toStringBinary(getRegionInfo().getRegionName()));
       }
       LOG.info("Running extra flush, " + actualFlushes +
         " (carrying snapshot?) " + this);
     }
     internalFlushcache(status);
   } catch (IOException ioe) {
     status.setStatus("Failed flush " + this + ", putting online again");
     synchronized (writestate) {
       writestate.writesEnabled = true;
     }
     // Have to throw to upper layers.  I can't abort server from here.
     throw ioe;
   }
 }
}
{code}

Here is a suggestion of how to tweak this code so that it also handles the case 
of a compacting memstore: count the number of ‘’failed’’ flush requests, that 
is flush requests that did not reduce the size of the memstore. Limit the 
number of failed attempts. This is equivalent to the original intent.
{code}
// Don't flush the cache if we are aborting
if (!abort && canFlush) {
 int failedfFlushCount = 0;
 int flushCount = 0;
 int remainingSize = this.memstoreSize.get();
 while (remainingSize > 0) {
   try {
     internalFlushcache(status);
     if(flushCount >0) {
       LOG.info("Running extra flush, " + flushCount +
         " (carrying snapshot?) " + this);
     }
     flushCount++;
     int tmp = this.memstoreSize.get();
     if (tmp >= remainingSize) {
       failedfFlushCount++;
     } 
     remainingSize = tmp;
     if (failedfFlushCount > 5) {
         // If we failed 5 times and are unable to clear memory, abort
         // so we do not lose data
         throw new DroppedSnapshotException("Failed clearing memory after " +
           flushCount + " attempts on region: " +
             Bytes.toStringBinary(getRegionInfo().getRegionName()));
     }
   } catch (IOException ioe) {
     status.setStatus("Failed flush " + this + ", putting online again");
     synchronized (writestate) {
       writestate.writesEnabled = true;
     }
     // Have to throw to upper layers.  I can't abort server from here.
     throw ioe;
   }
 }
}
{code}

> Compacting Memstore
> -------------------
>
>                 Key: HBASE-14920
>                 URL: https://issues.apache.org/jira/browse/HBASE-14920
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14920) Compacting Memstore

Reply via email to