HLog Compactions
----------------
Key: HBASE-3242
URL: https://issues.apache.org/jira/browse/HBASE-3242
Project: HBase
Issue Type: Improvement
Components: replication
Reporter: Nicolas Spiegelberg
Currently, our memstore flush algorithm is pretty trivial. We let it grow to a
flushsize and flush a region or grow to a certain log count and then flush
everything below a seqid. In certain situations, we can get big wins from
being more intelligent with our memstore flush algorithm. I suggest we look
into algorithms to intelligently handle HLog compactions. By compaction, I
mean replacing existing HLogs with new HLogs created using the contents of a
memstore snapshot. Situations where we can get huge wins:
1. In the incrementColumnValue case, N HLog entries often correspond to a
single memstore entry. Although we may have large HLog files, our memstore
could be relatively small.
2. If we have a hot region, the majority of the HLog consists of that one
region and other region edits would be minuscule.
In both cases, we are forced to flush a bunch of very small stores. Its really
hard for a compaction algorithm to be efficient when it has no guarantees of
the approximate size of a new StoreFile, so it currently does unconditional,
inefficient compactions. Additionally, compactions & flushes suck because they
invalidate cache entries: be it memstore or LRUcache. If we can limit flushes
to cases where we will have significant HFile output on a per-Store basis, we
can get improved performance, stability, and reduced failover time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.