[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941076#comment-16941076
 ] 

Jacob LeBlanc commented on HBASE-23066:
---------------------------------------

Regarding getting some numbers on if the size of the data exceeds the cache, 
I'm not sure what is being asked for?

My thinking is that behavior in that regard is not going to be any different. 
Keep in mind: this setting only applies when prefetching is already enabled for 
the column family. In other words we are already going to read the new file 
entirely into cache. Enabling this setting will only do a little bit earlier 
while we are writing it out to circumvent a glut of cache misses that kill 
performance for a period of time after compaction finishes. So if other data 
will be evicted with the setting enabled, then it would be evicted without this 
patch as well. This is also why I'm not sure a per-table setting such as a 
warmup threshold is needed. In fact I'd be happy if this was the default 
setting as I don't see any negatives, but I'd understand keeping it disabled by 
default for risk purposes.

> Allow cache on write during compactions when prefetching is enabled
> -------------------------------------------------------------------
>
>                 Key: HBASE-23066
>                 URL: https://issues.apache.org/jira/browse/HBASE-23066
>             Project: HBase
>          Issue Type: Improvement
>          Components: Compaction, regionserver
>    Affects Versions: 1.4.10
>            Reporter: Jacob LeBlanc
>            Assignee: Jacob LeBlanc
>            Priority: Minor
>             Fix For: 1.5.0, 2.3.0
>
>         Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to