[ 
https://issues.apache.org/jira/browse/HBASE-28031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792113#comment-17792113
 ] 

Bryan Beaudreault commented on HBASE-28031:
-------------------------------------------

Ok so that one, it looks like the test causing all that spam 
(testUserTableClusterScopeQuota) actually succeeds. The job times out during 
testUserNamespaceClusterScopeQuota. According to the thread dump at the end, 
it's while trying to refresh quota cache:
{code:java}
"Listener at localhost/42451" daemon prio=5 tid=18 runnable
java.lang.Thread.State: RUNNABLE
        at javax.security.auth.Subject.getSubject(Subject.java:297)
        at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:577)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.<init>(User.java:280)
        at org.apache.hadoop.hbase.security.User.getCurrent(User.java:160)
        at 
org.apache.hadoop.hbase.quotas.ThrottleQuotaTestUtil.triggerCacheRefresh(ThrottleQuotaTestUtil.java:156)
        at 
org.apache.hadoop.hbase.quotas.ThrottleQuotaTestUtil.triggerUserCacheRefresh(ThrottleQuotaTestUtil.java:109)
        at 
org.apache.hadoop.hbase.quotas.TestClusterScopeQuotaThrottle.testUserNamespaceClusterScopeQuota(TestClusterScopeQuotaThrottle.java:197)
 {code}
Looks like there's a [while loop in 
there|https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/ThrottleQuotaTestUtil.java#L150-L184]
 which doesn't have a timeout. We should probably update that to use Waiter 
with a timeout, so at least the test will fail more usefully.

The while loop triggers the QuotaRefresherChore. According to the thread dump, 
that chore is waiting on a future:
{code:java}
"regionserver/jenkins-hbase19:0.Chore.1" daemon prio=5 tid=366 in Object.wait()
java.lang.Thread.State: WAITING (on object monitor)
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
        at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182)
        at 
org.apache.hadoop.hbase.client.TableOverAsyncTable.get(TableOverAsyncTable.java:193)
        at 
org.apache.hadoop.hbase.quotas.QuotaTableUtil.doGet(QuotaTableUtil.java:910)
        at 
org.apache.hadoop.hbase.quotas.QuotaUtil.fetchGlobalQuotas(QuotaUtil.java:375)
        at 
org.apache.hadoop.hbase.quotas.QuotaUtil.fetchNamespaceQuotas(QuotaUtil.java:342)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore$1.fetchEntries(QuotaCache.java:274)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetch(QuotaCache.java:365)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetchNamespaceQuotaState(QuotaCache.java:266)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:257)
 {code}
This is just a point-in-time, and I don't see any exceptions. I don't know if 
that Future is actually blocked, or maybe it returns quickly but not what the 
triggerCacheRefresh is looking for. We might need more logging. There's a 
LOG.debug("QuotaCache") dump _after_ the while loop. Maybe we need to move it 
to _inside_ the while loop.

> TestClusterScopeQuotaThrottle is still failing with broken WAL writer
> ---------------------------------------------------------------------
>
>                 Key: HBASE-28031
>                 URL: https://issues.apache.org/jira/browse/HBASE-28031
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>
> {noformat}
> 2023-08-17T10:47:31,026 WARN  [regionserver/jenkins-hbase19:0.logRoller {}] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper(515): create fan-out dfs output 
> /user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
>  failed, retry = 0
> org.apache.hadoop.ipc.RemoteException: File 
> /user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
>  could only be written to 0 of the 1 minReplication nodes. There are 2 
> datanode(s) running and 2 node(s) are excluded in this operation.
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2276)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2820)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:910)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)
>       at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612) 
> ~[hadoop-common-3.2.4.jar:?]
>       at org.apache.hadoop.ipc.Client.call(Client.java:1558) 
> ~[hadoop-common-3.2.4.jar:?]
>       at org.apache.hadoop.ipc.Client.call(Client.java:1455) 
> ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  ~[hadoop-common-3.2.4.jar:?]
>       at com.sun.proxy.$Proxy41.addBlock(Unknown Source) ~[?:?]
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
>  ~[hadoop-hdfs-client-3.2.4.jar:?]
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_362]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>  ~[hadoop-common-3.2.4.jar:?]
>       at com.sun.proxy.$Proxy42.addBlock(Unknown Source) ~[?:?]
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_362]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
>       at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361) 
> ~[classes/:?]
>       at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_362]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
>       at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361) 
> ~[classes/:?]
>       at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:492)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:120)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:569)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:564)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.2.4.jar:?]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:577)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:54)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:183)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:114)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:241)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:247)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:104)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:1050)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$9(AbstractFSWAL.java:1082)
>  ~[classes/:?]
>       at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
> ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:1082)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:311)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:212) 
> ~[classes/:?]
> {noformat}
> Need to dig more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to