[jira] [Commented] (HBASE-27043) Let lock wait timeout to improve performance of SnapshotHFileCleaner

Hudson (Jira) Sat, 21 May 2022 22:36:06 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-27043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540536#comment-17540536
 ]


Hudson commented on HBASE-27043:
--------------------------------

Results for branch branch-2
        [build #546 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/546/]: 
(/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/546/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/546/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/546/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/546/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Let lock wait timeout to improve performance of SnapshotHFileCleaner
> --------------------------------------------------------------------
>
>                 Key: HBASE-27043
>                 URL: https://issues.apache.org/jira/browse/HBASE-27043
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha-2, 2.4.12
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-3
>
>         Attachments: clearner-before-and-after.png, namenode-callqueue.png
>
>
> Currently, hfile cleaner uses the dir scanning threads to get deletable 
> files, by checking all the files under the scanned directories through the 
> cleaner chain. And before scanning a directory, cleaner sorted the 
> subdirectories by consumed spaces, but  we all know getContentSummary is a 
> time consuming operation for HDFS.
> SnapshotHFileCleaner filters all the unreferenced files of snapshots to 
> delete, and it tries to get write lock of SnapshotManager#takingSnapshotLock 
> before determining the deletable files. 
> [https://github.com/apache/hbase/blob/ad64a9baae2ef8ee56aa3ed6b96cb3d51f5daf0a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotFileCache.java#L195]
> But when there is any snapshot taking and the cleaner gets the lock failed, 
> all the scanned files will be determined to be none-deletable, and all the 
> dir scanning threads will scan and getContentSummary of other dirs(no files 
> are deletable too) one by one until the snapshot taking lock is released.
> This is a low efficiency behavior, we should let the clear wait the lock to 
> determine if the files it currently hold are deletable, instead of the 
> meaningless scanning and getContentSummary other/new directories while the 
> lock is acquired by taking snapshot operations.
> I deployed this optimization in our production environment, and the effect is 
> very obvious.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HBASE-27043) Let lock wait timeout to improve performance of SnapshotHFileCleaner

Reply via email to