[ 
https://issues.apache.org/jira/browse/HBASE-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947962#comment-17947962
 ] 

Hudson commented on HBASE-29109:
--------------------------------

Results for branch branch-2.5
        [build #686 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk17 hadoop 3.2.4 backward compatibility checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk17 hadoop 3.3.5 backward compatibility checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk17 hadoop 3.3.6 backward compatibility checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686//console].


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/686//console].


> Process stage of TakeSnapshotHandler runs Async which is holding the 
> EXCLUSIVE Lock of Table for around 9 mins blocking ASSIGN procedures of SCP
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29109
>                 URL: https://issues.apache.org/jira/browse/HBASE-29109
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.5.8
>            Reporter: Prathyusha
>            Assignee: Prathyusha
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.5.12
>
>
> {{[SnapshotManager|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java#L659-L660]
>  does *prepare* and *process* as below}}
> {quote}{{{}{color:#6a3e3e}handler{color}{color:#000000}.prepare();{color}{}}}{color:#7f0055}this{color}{color:#000000}.{color}{color:#0000c0}executorService{color}{color:#000000}.submit({color}{color:#6a3e3e}handler{color}{color:#000000});
>  (process called here){color}
> {quote}
> For Enabled tables prepare() takes an EXCLUSIVE lock with a timeout 
> and right when it starts processing it releases the current EXCLUSIVE lock on 
> the table and takes a SHARED one instead
> {quote}{color:#000000} {color}{color:#7f0055}try{color}{color:#000000} 
> {{color}
> {color:#000000} {color}{color:#7f0055}if{color}{color:#000000} 
> (downgradeToSharedTableLock()) {{color}
> {color:#000000} {color}{color:#3f7f5f}// release the exclusive lock and hold 
> the shared lock instead{color}
> {color:#000000} 
> {color}{color:#6a3e3e}tableLockToRelease{color}{color:#000000} = 
> {color}{color:#0000c0}master{color}{color:#000000}.getLockManager().createMasterLock({color}{color:#0000c0}snapshotTable{color}{color:#000000},{color}
> {color:#000000} LockType.{color}{color:#0000c0}SHARED{color}{color:#000000}, 
> {color}{color:#7f0055}this{color}{color:#000000}.getClass().getName() + 
> {color}{color:#2a00ff}": take snapshot "{color}{color:#000000} + 
> {color}{color:#0000c0}snapshot{color}{color:#000000}.getName());{color}
> {color:#000000} 
> {color}{color:#0000c0}tableLock{color}{color:#000000}.release();{color}
> {color:#000000} {color}{color:#7f0055}boolean{color}{color:#000000} 
> {color}{color:#6a3e3e}isTableLockAcquired{color}{color:#000000} = 
> {color}{color:#6a3e3e}tableLockToRelease{color}{color:#000000}.tryAcquire({color}{color:#7f0055}this{color}{color:#000000}.{color}{color:#0000c0}lockAcquireTimeoutMs{color}{color:#000000});{color}
> {color:#000000} {color}{color:#7f0055}if{color}{color:#000000} 
> (!{color}{color:#6a3e3e}isTableLockAcquired{color}{color:#000000}) {{color}
> {color:#000000} 
> {color}{color:#0000c0}LOG{color}{color:#000000}.error({color}{color:#2a00ff}"Could
>  not acquire shared lock on table {} in {} ms"{color}{color:#000000}, 
> {color}{color:#0000c0}snapshotTable{color}{color:#000000},{color}
> {color:#000000} 
> {color}{color:#0000c0}lockAcquireTimeoutMs{color}{color:#000000});{color}
> {color:#000000} {color}{color:#7f0055}throw{color}{color:#000000} 
> {color}{color:#7f0055}new{color}{color:#000000} 
> IOException({color}{color:#2a00ff}"Could not acquire shared lock on table 
> "{color}{color:#000000} + 
> {color}{color:#0000c0}snapshotTable{color}{color:#000000});{color}
> {color:#000000} }{color}
> {color:#000000} }{color}
> {quote}
> Since the process stage runs in ExecutorService sometimes there is a delay
> and Snapshot hold the EXCLUSIVE too long, blocking the ASSIGN child 
> procedures of SCP
> Below are the logs we saw for the same in our production
> 1.Process stage of Snapshot started
> {quote}2025-01-15 02:15:09,608 INFO  
> [ER_SNAPSHOT_OPERATIONS-master/hmaster-3:60000-5360] 
> snapshot.TakeSnapshotHandler - Running SKIPFLUSH table snapshot 
> <TABLE_NAME>_1736902854385_1736906318965_0 C_M_SNAPSHOT_TABLE on table 
> <TABLE_NAME>
> {quote}
> 2.then the EXCLUSIVE lock is released 
> {quote}2025-01-15 02:15:09,615 INFO  [PEWorker-20] 
> procedure2.ProcedureExecutor - Finished pid=26291867, state=SUCCESS; 
> org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=<TABLE_NAME>, 
> type=EXCLUSIVE in 9 mins, 55.422 sec
> {quote}
> 3.ASSIGN of SCP was able to then get the lock 
> {quote}2025-01-15 02:15:09,615 INFO  [PEWorker-20] 
> procedure.MasterProcedureScheduler - Took xlock for pid=26292238, 
> ppid=26291869, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=<TABLE_NAME>, 
> region=e9d104da353c65725b33f902c8a1449d, ASSIGN
> {quote}
> 4.Though once lock is acquired ASSIGN was pretty quick, total time of that 
> region availability shoot up to 9 mins
> {quote}2025-01-15 02:15:10,384 INFO  [PEWorker-24] 
> procedure2.ProcedureExecutor - Finished pid=26292238, ppid=26291869, 
> state=SUCCESS; TransitRegionStateProcedure table=<TABLE_NAME>, 
> region=e9d104da353c65725b33f902c8a1449d, ASSIGN in 9 mins, 1.364 sec
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to