[ 
https://issues.apache.org/jira/browse/HBASE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350777#comment-17350777
 ] 

Hudson commented on HBASE-25899:
--------------------------------

Results for branch branch-2
        [build #259 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259//console].


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/259//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Improve efficiency of SnapshotHFileCleaner
> ------------------------------------------
>
>                 Key: HBASE-25899
>                 URL: https://issues.apache.org/jira/browse/HBASE-25899
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 3.0.0-alpha-1, 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>         Attachments: 78631.jstack, cleaner-result.png
>
>
> We have met same problems of thousands threads in HBASE-22867, but after this 
> issue, the cleaner becomes more inefficient.
> From the jstack we can see that most dir-scan threads are blocked at 
> SnapshotHFileCleaner#getDeletableFiles,
> {code:java}
> "dir-scan-pool-19" #694 daemon prio=5 os_prio=0 tid=0x0000000002ab1800 
> nid=0x26a7e waiting for monitor entry [0x00007fb0a9913000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:74)
>         - waiting to lock <0x00007fb148737048> (a 
> org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:498)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:246)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$41/1187372779.act(Unknown
>  Source)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:358)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:246)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:255)
>         at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$38/2003131501.run(Unknown
>  Source)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745){code}
> and all the HFileCleaner threads are waiting at the delete tasks queue,
> {code:java}
> "gha-data-hbase0002:16000.activeMasterManager-HFileCleaner.large.2-1621210982419"
>  #358 daemon prio=5 os_prio=0 tid=0x00007fb967fc0000 nid=0x266f2 waiting on 
> condition [0x00007fb0c57d6000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007fb1486db9f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at 
> org.apache.hadoop.hbase.util.StealJobQueue.take(StealJobQueue.java:106)
>         at 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner.consumerLoop(HFileCleaner.java:264)
>         at 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner$1.run(HFileCleaner.java:233)
> {code}
> So it's need to increase the speed of scanning files. But since the 
> getDeletableFiles is a synchronized method, increasing the number of scan-dir 
> threads can not solve this problem. 
> After looking through the codes in SnapshotHFileCleaner and 
> SnapshotFileCache, I think the lock granularity in them should be optimized.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to