[
https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205667#comment-13205667
]
Kannan Muthukkaruppan commented on HBASE-5263:
----------------------------------------------
Zhihong: Yes! Fixed it in place. I had a recursive reference going there... :)
> Preserving cached data on compactions through cache-on-write
> ------------------------------------------------------------
>
> Key: HBASE-5263
> URL: https://issues.apache.org/jira/browse/HBASE-5263
> Project: HBase
> Issue Type: Improvement
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Priority: Minor
>
> We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the
> block cache on compactions if cache-on-write is enabled. However, it would be
> ideal to reduce the effect compactions have on the cached data. For every
> block we are writing for a compacted file we can decide whether it needs to
> be cached based on whether the original blocks containing the same data were
> already in cache. More precisely, for every HFile reader in a compaction we
> can maintain a boolean flag saying whether the current key-value came from a
> disk IO or the block cache. In the HFile writer for the compaction's output
> we can maintain a flag that is set if any of the key-values in the block
> being written came from a cached block, use that flag at the end of a block
> to decide whether to cache-on-write the block, and reset the flag to false on
> a block boundary. If such an inclusive approach would still trash the cache,
> we could restrict the total number of blocks to be cached per an output
> HFile, switch to an "and" logic instead of "or" logic for deciding whether to
> cache an output file block, or only cache a certain percentage of output file
> blocks that contain some of the previously cached data.
> Thanks to Nicolas for this elegant online algorithm idea!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira