[
https://issues.apache.org/jira/browse/HBASE-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755504#comment-17755504
]
Duo Zhang commented on HBASE-28027:
-----------------------------------
After applying the PR, it succeeded for one run, and still failed in another
run.
https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/11677/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.quotas.TestClusterScopeQuotaThrottle-output.txt
Looking at the log, seems there are log rolling errors which finally bring down
all the region servers...
{noformat}
2023-08-17T10:47:31,026 WARN [regionserver/jenkins-hbase19:0.logRoller {}]
asyncfs.FanOutOneBlockAsyncDFSOutputHelper(515): create fan-out dfs output
/user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
failed, retry = 0
org.apache.hadoop.ipc.RemoteException: File
/user/jenkins/test-data/bb8017fa-92f5-92c9-2f1d-aa9b90cf4b80/WALs/jenkins-hbase19.apache.org,43363,1692269230784/jenkins-hbase19.apache.org%2C43363%2C1692269230784.meta.1692433272886.meta
could only be written to 0 of the 1 minReplication nodes. There are 2
datanode(s) running and 2 node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2276)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2820)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:910)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1558)
~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
~[hadoop-common-3.2.4.jar:?]
at com.sun.proxy.$Proxy41.addBlock(Unknown Source) ~[?:?]
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
~[hadoop-hdfs-client-3.2.4.jar:?]
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_362]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
~[hadoop-common-3.2.4.jar:?]
at com.sun.proxy.$Proxy42.addBlock(Unknown Source) ~[?:?]
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_362]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
at
org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
~[classes/:?]
at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_362]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
at
org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
~[classes/:?]
at com.sun.proxy.$Proxy45.addBlock(Unknown Source) ~[?:?]
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:492)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:120)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:569)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$7.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:564)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
~[hadoop-common-3.2.4.jar:?]
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:577)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:54)
~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:183)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
~[classes/:?]
at
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:114)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createAsyncWriter(AsyncFSWAL.java:241)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:247)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:104)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:1050)
~[classes/:?]
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$9(AbstractFSWAL.java:1082)
~[classes/:?]
at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:1082)
~[classes/:?]
at
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:311)
~[classes/:?]
at
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:212)
~[classes/:?]
{noformat}
Not sure what is the problem...
Let's keep an eye on later runs...
> Make TestClusterScopeQuotaThrottle run faster
> ---------------------------------------------
>
> Key: HBASE-28027
> URL: https://issues.apache.org/jira/browse/HBASE-28027
> Project: HBase
> Issue Type: Sub-task
> Components: Quotas, test
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
>
> -The test always times out and it has several test methods.
> Let's split the test into several smaller tests, so we can find out which one
> is the criminal.-
> Finally I found that, the problem is we does not limit the operation timeout,
> just set the max retry number, but after some new improvements come in,
> sometimes we may get a 6 min sleep time and n doubt we will get a test
> timeout...
> I changed the test a bit to set operation timeout on the Table instance we
> use, so it will fail immediately when we hit the quota throttling, and now
> the tests could finish very soon.
> I think we could add another E2E tests to make sure that the refilling works
> as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)