taklwu commented on a change in pull request #3684:
URL: https://github.com/apache/hbase/pull/3684#discussion_r708712194
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java
##########
@@ -60,6 +62,14 @@ public ClientSideRegionScanner(Configuration conf,
FileSystem fs,
region = HRegion.newHRegion(CommonFSUtils.getTableDir(rootDir,
htd.getTableName()), null, fs,
conf, hri, htd, null);
region.setRestoredRegion(true);
+ // non RS process does not have a block cache, and this a client side
scanner,
+ // create one for MapReduce jobs to cache the INDEX block
+ conf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
+ conf.getFloat(HConstants.HBASE_CLIENT_SCANNER_BLOCK_CACHE_SIZE_KEY,
+ HConstants.HBASE_CLIENT_SCANNER_BLOCK_CACHE_SIZE_DEFAULT));
Review comment:
[DISCUSS] we can discuss if we should just set to
`HFILE_BLOCK_CACHE_SIZE_KEY` to `0.1f` , or we just don't set it
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java
##########
@@ -60,6 +62,14 @@ public ClientSideRegionScanner(Configuration conf,
FileSystem fs,
region = HRegion.newHRegion(CommonFSUtils.getTableDir(rootDir,
htd.getTableName()), null, fs,
conf, hri, htd, null);
region.setRestoredRegion(true);
+ // non RS process does not have a block cache, and this a client side
scanner,
+ // create one for MapReduce jobs to cache the INDEX block
+ conf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
+ conf.getFloat(HConstants.HBASE_CLIENT_SCANNER_BLOCK_CACHE_SIZE_KEY,
+ HConstants.HBASE_CLIENT_SCANNER_BLOCK_CACHE_SIZE_DEFAULT));
+ // don't allow L2 bucket cache for non RS process to avoid unexpected disk
usage.
+ conf.unset(HConstants.BUCKET_CACHE_IOENGINE_KEY);
Review comment:
[DISCUSS] this `ClientSideRegionScanner` should be only used by
downstream client, e.g. MR input format like `TableSnapshotInputFormatImpl` and
`TableSnapshotScanner` , such that we should not allow it to create additional
disk cache. IMO we should disable it at least before any one has a use case for
it, and avoid unexpected resource usage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]