[
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984140#comment-16984140
]
Anoop Sam John commented on HBASE-23066:
----------------------------------------
Right now if the prefetch is turned on, we will do the data prefetch as part of
the HFile open. Once a compaction is over and committed, the new file reader
gets opened and we will give the prefetch job to the prefetch thread pool.
There are by default 4 threads only here. So this is not so aggressive prefetch
we can see. Also it avoids the need to do one extra HFile read for the caching
by the prefetch thread.
With this new config, what we do is write to cache along with the HFile create
itself. Blocks are added to cache as and when it is written to the HFile. So
its aggressive. Ya it helps to make the new File data available from time 0
itself. The concern is this in a way demands 2x cache size. Because the
compacting files data might be already there in the cache. While the new file
write, those old files are still valid. The new one is not even committed by
the RS. The size concern is big when it is a major compaction! The comment
from @chenxu seems valid. Should we see that angle also?
On a side note, (Not related to this issue) when we have cache on write ON as
well as prefetch also On, do we do the caching part for the flushed files
twice? When it is written, its already been added to cache. Later as part of
HFile reader open, the prefetch threads will again do a read and add to cache!
> Allow cache on write during compactions when prefetching is enabled
> -------------------------------------------------------------------
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
> Issue Type: Improvement
> Components: Compaction, regionserver
> Affects Versions: 1.4.10
> Reporter: Jacob LeBlanc
> Assignee: Jacob LeBlanc
> Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png,
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are
> small enough to fit into a cache (or the cache is large enough),
> prefetchOnOpen can be enabled to make the entire table available in cache
> after the initial region opening is completed. Any new data can also be
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction.
> We found very poor performance after compactions for tables under heavy read
> load and a slower backing filesystem (S3). After a compaction the prefetching
> threads need to compete with threads servicing read requests and get
> constantly blocked as a result.
> This is a proposal to introduce a new cache configuration option that would
> cache blocks on write during compaction for any column family that has
> prefetch enabled. This would virtually guarantee all blocks are kept in cache
> after the initial prefetch on open is completed allowing for guaranteed
> steady read performance despite a slow backing file system.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)