BukrosSzabolcs commented on code in PR #4418: URL: https://github.com/apache/hbase/pull/4418#discussion_r874096575
########## hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java: ########## @@ -1867,6 +1870,10 @@ executorService.new ExecutorConfig().setExecutorType(ExecutorType.RS_SNAPSHOT_OP choreService.scheduleChore(brokenStoreFileCleaner); } + if (this.rsMobFileCleanerChore != null) { + choreService.scheduleChore(rsMobFileCleanerChore); Review Comment: Let me clarify the differences between the 2 cleaners. RSMobFileCleanerChore: - runs on RS to have access to currently written files and active storefile list - can only archive mob files created by regions hosted on the current RS - only reads hfiles belonging to regions hosted by the current RS when looking for references This allow it to do the majority of the cleanup necessary as efficiently as possible MobFileCleanerChore: - runs on Master - can only archive mob files created by archived regions (regions no longer existing in the /data folder). Thanks to this these mob files can no longer be "currently written" so we do not need the data only available on the RS - reads every single hfile in /data with a mob enabled CF. This is necessary, because this is the only way if any of these mobs has active references So yes, there is an overlap, the same hfiles could be read by both cleansers but the a mob file could be archived with either one of them. The cleaner on the master is wasteful and not especially elegant, but thanks to the lack of centralized mob tracking we do not have a better way of collecting the references. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org