wchevreuil commented on code in PR #5492:
URL: https://github.com/apache/hbase/pull/5492#discussion_r1480494819
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:
##########
@@ -1659,6 +1662,43 @@ public RegionLoad createRegionLoad(final String
encodedRegionName) throws IOExce
return r != null ? createRegionLoad(r, null, null) : null;
}
+ public Map<String, Integer> uncacheStaleBlocks() {
+ Map<String, Pair<String, Long>> fullyCachedFiles =
+
this.getBlockCache().flatMap(BlockCache::getFullyCachedFiles).orElse(Collections.emptyMap());
+ Map<String, Integer> evictedFilesWithStaleBlocks = new
ConcurrentHashMap<>();
+
+ ExecutorService executor = Executors.newFixedThreadPool(6);
+
+ List<Callable<Void>> tasks = new ArrayList<>();
+
+ fullyCachedFiles.forEach((fileName, value) -> {
+ Callable<Void> task = () -> {
+ HRegion regionOnServer = getRegion(value.getFirst());
+ int blocksEvicted = (regionOnServer == null ||
!regionOnServer.isAvailable())
+ ? this.getBlockCache().get().evictBlocksByHfileName(fileName)
+ : 0;
+ evictedFilesWithStaleBlocks.put(fileName, blocksEvicted);
+ LOG.info(
+ "Uncached {} blocks belonging to the file {} as the region {} "
+ + "is not served by the region server {} anymore.",
+ blocksEvicted, fileName, value.getFirst(), this.getServerName());
+ return null;
+ };
+ tasks.add(task);
Review Comment:
We should do a single task per call to the method, not a task per file, that
would create yet another collection in memory with as many objects as the total
files cached. On large caches, that would be impacting.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:
##########
@@ -1659,6 +1662,43 @@ public RegionLoad createRegionLoad(final String
encodedRegionName) throws IOExce
return r != null ? createRegionLoad(r, null, null) : null;
}
+ public Map<String, Integer> uncacheStaleBlocks() {
+ Map<String, Pair<String, Long>> fullyCachedFiles =
+
this.getBlockCache().flatMap(BlockCache::getFullyCachedFiles).orElse(Collections.emptyMap());
+ Map<String, Integer> evictedFilesWithStaleBlocks = new
ConcurrentHashMap<>();
+
+ ExecutorService executor = Executors.newFixedThreadPool(6);
Review Comment:
The threadpool should be global for HRegionServer. We don't want a new
threadpool created on every client call to this method, that would potentially
kill the RS if an operator calls it multiple times on a short period.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:
##########
@@ -1659,6 +1662,43 @@ public RegionLoad createRegionLoad(final String
encodedRegionName) throws IOExce
return r != null ? createRegionLoad(r, null, null) : null;
}
+ public Map<String, Integer> uncacheStaleBlocks() {
+ Map<String, Pair<String, Long>> fullyCachedFiles =
+
this.getBlockCache().flatMap(BlockCache::getFullyCachedFiles).orElse(Collections.emptyMap());
+ Map<String, Integer> evictedFilesWithStaleBlocks = new
ConcurrentHashMap<>();
+
+ ExecutorService executor = Executors.newFixedThreadPool(6);
+
+ List<Callable<Void>> tasks = new ArrayList<>();
+
+ fullyCachedFiles.forEach((fileName, value) -> {
+ Callable<Void> task = () -> {
+ HRegion regionOnServer = getRegion(value.getFirst());
+ int blocksEvicted = (regionOnServer == null ||
!regionOnServer.isAvailable())
+ ? this.getBlockCache().get().evictBlocksByHfileName(fileName)
+ : 0;
+ evictedFilesWithStaleBlocks.put(fileName, blocksEvicted);
+ LOG.info(
+ "Uncached {} blocks belonging to the file {} as the region {} "
+ + "is not served by the region server {} anymore.",
+ blocksEvicted, fileName, value.getFirst(), this.getServerName());
+ return null;
+ };
+ tasks.add(task);
+ });
+
+ try {
+ executor.invokeAll(tasks);
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ LOG.error("Thread interrupted while processing tasks for uncaching stale
blocks: {}",
+ e.getMessage());
+ } finally {
+ executor.shutdown();
+ }
+ return evictedFilesWithStaleBlocks;
Review Comment:
We should think about an alternative result here, as this map is effectively
being updated by the background task, which can be confusing for the caller. I
would rather make this as void, or a boolean, returning true indicating that
the task is submitted and is running in the background.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]