Preserving cached data on compactions through cache-on-write
------------------------------------------------------------
Key: HBASE-5263
URL: https://issues.apache.org/jira/browse/HBASE-5263
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the block
cache on compactions if cache-on-write is enabled. However, it would be ideal
to reduce the effect compactions have on the cached data. For every block we
are writing for a compacted file we can decide whether it needs to be cached
based on whether the original blocks containing the same data were already in
cache. More precisely, for every HFile reader in a compaction we can maintain a
boolean flag saying whether the current key-value came from a disk IO or the
block cache. In the HFile writer for the compaction's output we can maintain a
flag that is set if any of the key-values in the block being written came from
a cached block, use that flag at the end of a block to decide whether to
cache-on-write the block, and reset the flag to false on a block boundary. If
such an inclusive approach would still trash the cache, we could restrict the
total number of blocks to be cached per an output HFile, switch to an "and"
logic instead of "or" logic for deciding whether to cache an output file block,
or only cache a certain percentage of output file blocks that contain some of
the previously cached data.
Thanks to Nicolas for this elegant online algorithm idea!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira