HLog Compactions
----------------

                 Key: HBASE-3242
                 URL: https://issues.apache.org/jira/browse/HBASE-3242
             Project: HBase
          Issue Type: Improvement
          Components: replication
            Reporter: Nicolas Spiegelberg


Currently, our memstore flush algorithm is pretty trivial.  We let it grow to a 
flushsize and flush a region or grow to a certain log count and then flush 
everything below a seqid.  In certain situations, we can get big wins from 
being more intelligent with our memstore flush algorithm.  I suggest we look 
into algorithms to intelligently handle HLog compactions.  By compaction, I 
mean replacing existing HLogs with new HLogs created using the contents of a 
memstore snapshot.  Situations where we can get huge wins:

1. In the incrementColumnValue case,  N HLog entries often correspond to a 
single memstore entry.  Although we may have large HLog files, our memstore 
could be relatively small.
2. If we have a hot region, the majority of the HLog consists of that one 
region and other region edits would be minuscule.

In both cases, we are forced to flush a bunch of very small stores.  Its really 
hard for a compaction algorithm to be efficient when it has no guarantees of 
the approximate size of a new StoreFile, so it currently does unconditional, 
inefficient compactions.  Additionally, compactions & flushes suck because they 
invalidate cache entries: be it memstore or LRUcache.  If we can limit flushes 
to cases where we will have significant HFile output on a per-Store basis, we 
can get improved performance, stability, and reduced failover time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to