jacob-leblanc commented on a change in pull request #707: HBASE-23066 Allow
cache on write during compactions when prefetching …
URL: https://github.com/apache/hbase/pull/707#discussion_r335251644
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
##########
@@ -319,6 +337,13 @@ public boolean shouldPrefetchOnOpen() {
return this.prefetchOnOpen;
}
+ /**
+ * @return true if blocks should be cached while writing during compaction,
false if not
+ */
+ public boolean shouldCacheCompactedBlocksOnWrite() {
+ return this.prefetchCompactedDataOnWrite && this.prefetchOnOpen;
Review comment:
Thanks for looking at this. My understanding is that in cases where prefetch
is enabled, the new file is going to be read into the cache after compaction
completes anyway. So the cache size requirements are the same when this new
setting is enabled. This is why I wanted to limit the scope of the cache on
write to only apply where prefetching is enabled: it simply is a way to do the
cache loading more efficiently while we are writing the data out rather than
having to read it back after compaction is done which I've found is very
expensive when data is in S3.
As far as the name goes, I struggled to come up with something intuitive -
how do I explain in the name alone that this only applies when prefetching is
on? I tried to convey "when prefetching, do the prefetch of compacted data on
write." I'm not in love with the name and I'm open to suggestions. I didn't
want to give the false impression that all compacted data is going to be cached
on write. Maybe "cacheCompactedDataOnWriteIfPrefetching"? Is that too wordy?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services