Anoop Sam John commented on HBASE-20045:

What we discuss here is that only the recent data blocks should get cached.  
How to define recent should be left to the user config.  So for usages where 
there is enough cache size, the user can config it to cache all blocks.  And by 
default we should go with existing way of cache nothing.  Also to note that the 
write of the new compacted file happens first. Only when that is committed, we 
will start archiving the old compacted away files and so remove blocks of those 
files from BC.  Also what if the compaction final step of commit not success?  
Need to consider all even if edge cases.  So ideally when u are compacting , u 
will need double the size of those files in the cache area if this cache on 
write to happen.  The blocks of the compacted files, chances are there that the 
new caching will try to evict those blocks.  But those are recently accessed by 
this compaction read.  So who knows. May be some other valid blocks might get 
evicted also.. That is again an issue.   Can we distinguish the read by 
compaction op from the user reads at BC layer and not consider compaction read 
as recent access?  Ya once we have a patch would be interesting to play with 
this and try out.  Hope JMS will give one 

> When running compaction, cache recent blocks.
> ---------------------------------------------
>                 Key: HBASE-20045
>                 URL: https://issues.apache.org/jira/browse/HBASE-20045
>             Project: HBase
>          Issue Type: New Feature
>          Components: BlockCache, Compaction
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Jean-Marc Spaggiari
>            Priority: Major
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).

This message was sent by Atlassian JIRA

Reply via email to