[ 
https://issues.apache.org/jira/browse/HBASE-26715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505079#comment-17505079
 ] 

Hudson commented on HBASE-26715:
--------------------------------

Results for branch branch-2.5
        [build #60 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/60/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/60/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/60/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/60/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/60/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Blocked on SyncFuture in AsyncProtobufLogWriter#write
> -----------------------------------------------------
>
>                 Key: HBASE-26715
>                 URL: https://issues.apache.org/jira/browse/HBASE-26715
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>              Labels: WAL
>             Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
>
> Ran into an issue on hbase 2.4.6, I think related to HBASE-26679. Individual 
> writes are blocking on SyncFuture, which never gets completed. Eventually 
> (5m) the writes timeout and fail. But the regionserver hung on like this 
> basically forever until I killed it about 14 hours later. While 26679 may fix 
> the hang bug, I think we should have additional protection against such 
> zombie states. In this case I think what happened is that the rollWAL was 
> requested due to failed appends, but it also hung forever. See the below 
> stack trace:
>  
> {code:java}
> Thread 240 (regionserver/host:60020.logRoller):
>   State: WAITING
>   Blocked count: 38
>   Waited count: 293
>   Waiting on java.util.concurrent.CompletableFuture$Signaller@13342c6d
>   Stack:
>     [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
>     
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
>     
> [email protected]/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1796)
>     
> [email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128)
>     
> [email protected]/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1823)
>     
> [email protected]/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1998)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:189)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(AsyncProtobufLogWriter.java:202)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:170)
>     
> app//org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:113)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:669)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:130)
>     
> app//org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:841)
>     
> app//org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:268)
>     
> app//org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:187)
>  {code}
>  
> The wall roller thread was stuck on this wait seemingly forever, so it was 
> never able to roll the wal and get writes working again. I think we should 
> add a timeout here, and abort the regionserver if a WAL cannot be rolled in a 
> timely manner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to