Peter Somogyi created HBASE-27590:
-------------------------------------

             Summary: Change Iterable to List in CleanerChore
                 Key: HBASE-27590
                 URL: https://issues.apache.org/jira/browse/HBASE-27590
             Project: HBase
          Issue Type: Improvement
            Reporter: Peter Somogyi
            Assignee: Peter Somogyi


The HFileCleaners can have low performance on large /archive area when used 
with slow storage like S3. The snapshot write lock in SnapshotFileCache is held 
while the file metadata is fetched from S3. Due to this even with multiple 
cleaner threads only a single cleaner can effectively delete files from the 
archive.

File metadata collection is performed before SnapshotHFileCleaner just by 
changing the passed parameter type in FileCleanerDelegate from Iterable to List.

Running with the below cleaner configurations I observed that the lock held in 
SnapshotFileCache went down from 45000ms to 100msĀ  when it was running for 1000 
files in a directory. The complete evaluation and deletion for this folder took 
the same time but since the file metadata fetch from S3 was done outside of the 
lock the multiple cleaner threads were able to run concurrently.
{noformat}
hbase.cleaner.directory.sorting=false
hbase.cleaner.scan.dir.concurrent.size=0.75
hbase.regionserver.hfilecleaner.small.thread.count=16
hbase.regionserver.hfilecleaner.large.thread.count=8
{noformat}

The files to evaluate are already passed in a List to 
CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run the 
checks on the configured cleaners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to