[
https://issues.apache.org/jira/browse/HBASE-28031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791809#comment-17791809
]
Bryan Beaudreault commented on HBASE-28031:
-------------------------------------------
I'm actually not sure that exception is the problem. If you look at the logs,
we won't honor that 6m backoff:
{code:java}
2023-11-16T04:46:41,033 DEBUG [RPCClient-NioEventLoopGroup-5-2 {}]
backoff.HBaseServerExceptionPauseManager(61): RpcThrottlingException suggested
pause of 360000000000ns which would exceed the timeout. We should throw instead.
org.apache.hadoop.hbase.quotas.RpcThrottlingException:
org.apache.hadoop.hbase.quotas.RpcThrottlingException: number of read requests
exceeded - wait 6mins, 0ms {code}
Since the backoff is beyond the configured rpc timeout, it just gets thrown as
an exception. The doGets method will catch and log the IOException:
{code:java}
2023-11-16T04:46:41,026 ERROR [Listener at localhost/37593 {}]
quotas.ThrottleQuotaTestUtil(100): get failed after nRetries=10
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=1, exceptions:
2023-11-16T05:54:26.933Z,
org.apache.hadoop.hbase.quotas.RpcThrottlingException:
org.apache.hadoop.hbase.quotas.RpcThrottlingException: number of read requests
exceeded - wait 6mins, 0ms {code}
Notice attempts=1, so on the first attempt it bails out for the above "We
should throw instead" reason.
Anyway, the reason we are seeing 6 minutes here is by design for the test:
* The throttle is set to 20 req/{*}hour{*}
* ClusterThrottle works by dividing the total throttle by the number of
regions.
* A server with 1 region can do 10 req/hour, and they try sending 20 requests
* So the 10 requests will succeed, but consume the entire throttle. With 10
req/hour, it takes 60/10 = 6 minutes for another request to be allowed
Why are we seeing so many JVM pauses?
{code:java}
2023-11-16T04:46:41,361 WARN [M:0;jenkins-hbase19:38849 {}] util.Sleeper(86):
We slept 208100ms instead of 100ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired {code}
That's just one of many
> TestClusterScopeQuotaThrottle is still failing with broken WAL writer
> ---------------------------------------------------------------------
>
> Key: HBASE-28031
> URL: https://issues.apache.org/jira/browse/HBASE-28031
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
>
> {noformat}
> 2023-08-17T10:47:31,026 WARN [regionserver/jenkins-hbase19:0.logRoller {}]
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper(515): create fan-out dfs output
> /user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
> failed, retry = 0
> org.apache.hadoop.ipc.RemoteException: File
> /user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
> could only be written to 0 of the 1 minReplication nodes. There are 2
> datanode(s) running and 2 node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2276)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2820)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:910)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
> ~[hadoop-common-3.2.4.jar:?]
> at org.apache.hadoop.ipc.Client.call(Client.java:1558)
> ~[hadoop-common-3.2.4.jar:?]
> at org.apache.hadoop.ipc.Client.call(Client.java:1455)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> ~[hadoop-common-3.2.4.jar:?]
> at com.sun.proxy.$Proxy41.addBlock(Unknown Source) ~[?:?]
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
> ~[hadoop-hdfs-client-3.2.4.jar:?]
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_362]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> ~[hadoop-common-3.2.4.jar:?]
> at com.sun.proxy.$Proxy42.addBlock(Unknown Source) ~[?:?]
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_362]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
> at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> ~[classes/:?]
> at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_362]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
> at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> ~[classes/:?]
> at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:492)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:120)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:569)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:564)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> ~[hadoop-common-3.2.4.jar:?]
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:577)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:54)
> ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:183)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:114)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:241)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:247)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:104)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:1050)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$9(AbstractFSWAL.java:1082)
> ~[classes/:?]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
> ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:1082)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:311)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:212)
> ~[classes/:?]
> {noformat}
> Need to dig more.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)