[
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394843#comment-16394843
]
Anoop Sam John commented on HBASE-20045:
----------------------------------------
What we discuss here is that only the recent data blocks should get cached.
How to define recent should be left to the user config. So for usages where
there is enough cache size, the user can config it to cache all blocks. And by
default we should go with existing way of cache nothing. Also to note that the
write of the new compacted file happens first. Only when that is committed, we
will start archiving the old compacted away files and so remove blocks of those
files from BC. Also what if the compaction final step of commit not success?
Need to consider all even if edge cases. So ideally when u are compacting , u
will need double the size of those files in the cache area if this cache on
write to happen. The blocks of the compacted files, chances are there that the
new caching will try to evict those blocks. But those are recently accessed by
this compaction read. So who knows. May be some other valid blocks might get
evicted also.. That is again an issue. Can we distinguish the read by
compaction op from the user reads at BC layer and not consider compaction read
as recent access? Ya once we have a patch would be interesting to play with
this and try out. Hope JMS will give one
> When running compaction, cache recent blocks.
> ---------------------------------------------
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
> Issue Type: New Feature
> Components: BlockCache, Compaction
> Affects Versions: 2.0.0-beta-1
> Reporter: Jean-Marc Spaggiari
> Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for
> usecases where most queries are against recent data. However, as soon as
> their is a compaction, those blocks are evicted. It will be interesting to
> have a table level parameter to say "When compacting, cache blocks less than
> 24 hours old". That way, when running compaction, all blocks where some data
> are less than 24h hold, will be automatically cached.
>
> Very useful for table design where there is TS in the key but a long history
> (Like a year of sensor data).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)