ramkrishna.s.vasudevan commented on HBASE-20045:

bq. I see the argument above about not bothering to cache a block if all its 
cells are weeks old. In our case, the data is advertising identifiers and can 
come in unpredictably, and like I said we have a big enough bucket cache 
anyway, so why not just cache everything? The old blocks from the compacted 
away files are going to be evicted anyway, so we should never run out of bucket 
cache if we have sized it much larger than our entire data size.
Thanks for chiming in here. I like your argument. But one thing to note is that 
even if your compacted file blocks (old files) are evicted away when the new 
file is created after compaction (assuming there are no deletes) then almost 
the same number of blocks will be created again (new file after compaction) 
unless the the Column famliy has a TTL. Even I thought we can do this but the 
discussion here helped me to understand that it may not be possible always but 
may be cache recent data alone like what JMS says here. 
WE should also try out your suggestion also may be with a config, but warn the 
user that only a big enough bucket cache can help here. So just roughly can you 
say what is your bucket cache size ? and I think it is file mode and that file 
is in S3.

> When running compaction, cache recent blocks.
> ---------------------------------------------
>                 Key: HBASE-20045
>                 URL: https://issues.apache.org/jira/browse/HBASE-20045
>             Project: HBase
>          Issue Type: New Feature
>          Components: BlockCache, Compaction
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Jean-Marc Spaggiari
>            Priority: Major
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).

This message was sent by Atlassian JIRA

Reply via email to