[
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394811#comment-16394811
]
ramkrishna.s.vasudevan commented on HBASE-20045:
------------------------------------------------
bq. I see the argument above about not bothering to cache a block if all its
cells are weeks old. In our case, the data is advertising identifiers and can
come in unpredictably, and like I said we have a big enough bucket cache
anyway, so why not just cache everything? The old blocks from the compacted
away files are going to be evicted anyway, so we should never run out of bucket
cache if we have sized it much larger than our entire data size.
[~saadmufti]
Thanks for chiming in here. I like your argument. But one thing to note is that
even if your compacted file blocks (old files) are evicted away when the new
file is created after compaction (assuming there are no deletes) then almost
the same number of blocks will be created again (new file after compaction)
unless the the Column famliy has a TTL. Even I thought we can do this but the
discussion here helped me to understand that it may not be possible always but
may be cache recent data alone like what JMS says here.
WE should also try out your suggestion also may be with a config, but warn the
user that only a big enough bucket cache can help here. So just roughly can you
say what is your bucket cache size ? and I think it is file mode and that file
is in S3.
> When running compaction, cache recent blocks.
> ---------------------------------------------
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
> Issue Type: New Feature
> Components: BlockCache, Compaction
> Affects Versions: 2.0.0-beta-1
> Reporter: Jean-Marc Spaggiari
> Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for
> usecases where most queries are against recent data. However, as soon as
> their is a compaction, those blocks are evicted. It will be interesting to
> have a table level parameter to say "When compacting, cache blocks less than
> 24 hours old". That way, when running compaction, all blocks where some data
> are less than 24h hold, will be automatically cached.
>
> Very useful for table design where there is TS in the key but a long history
> (Like a year of sensor data).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)