Jean-Marc Spaggiari commented on HBASE-20045:

I agree with [~anoop.hbase]. By default we should NOT cache all the blocks of 
major compactions. If the file is 30GB, it will indeed exhaust most of the BC. 
Not good. The idea is to cache only blocks containing cells younger than a 
given timestamp. So on major compactions, most probably most of the blocks will 
not go in the cache. But when you have a table with a TTL of a day, and very 
fast ingestion, then most of them will stay on the cache. It should be a CF 
level parameter. Exactly like the TTL...

> When running compaction, cache recent blocks.
> ---------------------------------------------
>                 Key: HBASE-20045
>                 URL: https://issues.apache.org/jira/browse/HBASE-20045
>             Project: HBase
>          Issue Type: New Feature
>          Components: BlockCache, Compaction
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Jean-Marc Spaggiari
>            Priority: Major
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).

This message was sent by Atlassian JIRA

Reply via email to