[ https://issues.apache.org/jira/browse/HBASE-27043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaolin Ha updated HBASE-27043: ------------------------------- Affects Version/s: 2.4.12 3.0.0-alpha-2 (was: 2.5.0) (was: 3.0.0-alpha-3) > Let lock wait timeout to improve performance of SnapshotHFileCleaner > -------------------------------------------------------------------- > > Key: HBASE-27043 > URL: https://issues.apache.org/jira/browse/HBASE-27043 > Project: HBase > Issue Type: Improvement > Components: snapshots > Affects Versions: 3.0.0-alpha-2, 2.4.12 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > Attachments: clearner-before-and-after.png, namenode-callqueue.png > > > Currently, hfile cleaner uses the dir scanning threads to get deletable > files, by checking all the files under the scanned directories through the > cleaner chain. And before scanning a directory, cleaner sorted the > subdirectories by consumed spaces, but we all know getContentSummary is a > time consuming operation for HDFS. > SnapshotHFileCleaner filters all the unreferenced files of snapshots to > delete, and it tries to get write lock of SnapshotManager#takingSnapshotLock > before determining the deletable files. > [https://github.com/apache/hbase/blob/ad64a9baae2ef8ee56aa3ed6b96cb3d51f5daf0a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotFileCache.java#L195] > But when there is any snapshot taking and the cleaner gets the lock failed, > all the scanned files will be determined to be none-deletable, and all the > dir scanning threads will scan and getContentSummary of other dirs(no files > are deletable too) one by one until the snapshot taking lock is released. > This is a low efficiency behavior, we should let the clear wait the lock to > determine if the files it currently hold are deletable, instead of the > meaningless scanning and getContentSummary other/new directories while the > lock is acquired by taking snapshot operations. > I deployed this optimization in our production environment, and the effect is > very obvious. > -- This message was sent by Atlassian Jira (v8.20.7#820007)