[ 
https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-5263:
---------------------------------

    Assignee: Rishit Shroff  (was: Mikhail Bautin)
    
> Preserving cached data on compactions through cache-on-write
> ------------------------------------------------------------
>
>                 Key: HBASE-5263
>                 URL: https://issues.apache.org/jira/browse/HBASE-5263
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Rishit Shroff
>            Priority: Minor
>
> We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the 
> block cache on compactions if cache-on-write is enabled. However, it would be 
> ideal to reduce the effect compactions have on the cached data. For every 
> block we are writing for a compacted file we can decide whether it needs to 
> be cached based on whether the original blocks containing the same data were 
> already in cache. More precisely, for every HFile reader in a compaction we 
> can maintain a boolean flag saying whether the current key-value came from a 
> disk IO or the block cache. In the HFile writer for the compaction's output 
> we can maintain a flag that is set if any of the key-values in the block 
> being written came from a cached block, use that flag at the end of a block 
> to decide whether to cache-on-write the block, and reset the flag to false on 
> a block boundary. If such an inclusive approach would still trash the cache, 
> we could restrict the total number of blocks to be cached per an output 
> HFile, switch to an "and" logic instead of "or" logic for deciding whether to 
> cache an output file block, or only cache a certain percentage of output file 
> blocks that contain some of the previously cached data. 
> Thanks to Nicolas for this elegant online algorithm idea!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to