Peter Somogyi created HBASE-27590:
-------------------------------------
Summary: Change Iterable to List in CleanerChore
Key: HBASE-27590
URL: https://issues.apache.org/jira/browse/HBASE-27590
Project: HBase
Issue Type: Improvement
Reporter: Peter Somogyi
Assignee: Peter Somogyi
The HFileCleaners can have low performance on large /archive area when used
with slow storage like S3. The snapshot write lock in SnapshotFileCache is held
while the file metadata is fetched from S3. Due to this even with multiple
cleaner threads only a single cleaner can effectively delete files from the
archive.
File metadata collection is performed before SnapshotHFileCleaner just by
changing the passed parameter type in FileCleanerDelegate from Iterable to List.
Running with the below cleaner configurations I observed that the lock held in
SnapshotFileCache went down from 45000ms to 100msĀ when it was running for 1000
files in a directory. The complete evaluation and deletion for this folder took
the same time but since the file metadata fetch from S3 was done outside of the
lock the multiple cleaner threads were able to run concurrently.
{noformat}
hbase.cleaner.directory.sorting=false
hbase.cleaner.scan.dir.concurrent.size=0.75
hbase.regionserver.hfilecleaner.small.thread.count=16
hbase.regionserver.hfilecleaner.large.thread.count=8
{noformat}
The files to evaluate are already passed in a List to
CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run the
checks on the configured cleaners.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)