[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

Saad Mufti (JIRA) Mon, 12 Mar 2018 05:43:35 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395177#comment-16395177
 ]


Saad Mufti commented on HBASE-20045:
------------------------------------

{quote} So just roughly can you say what is your bucket cache size ? and I 
think it is file mode and that file is in S3.
{quote}
In our case like I said, all our actual HFile's are in S3, and the bucket cache 
is backed by locally mounted disks (actually EBS volumes in AWS but for all 
intents and purposes they act like local disks). The disk size has been chosen 
to be roughly 1.5 times the total data size in S3.

Also, the reason I think we wouldn't ever run out of bucket cache in our case 
is that, while we don't have an HBase level TTL on all cells (we do have it on 
some, where it makes sense), we do have business expiration criteria and a bulk 
job that runs in a separate cluster and periodically scans all the data for 
expired rows and actively deletes them from HBase. For a few years now our 
total data size has been more or less stable.

Cheers.

> When running compaction, cache recent blocks.
> ---------------------------------------------
>
>                 Key: HBASE-20045
>                 URL: https://issues.apache.org/jira/browse/HBASE-20045
>             Project: HBase
>          Issue Type: New Feature
>          Components: BlockCache, Compaction
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Jean-Marc Spaggiari
>            Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

Reply via email to