[
https://issues.apache.org/jira/browse/HDFS-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920156#comment-16920156
]
Chen Zhang commented on HDFS-14654:
-----------------------------------
Thanks [~ayushtkn] for the review. The {{TestRouterRpc}} failure in last build
is caused by
{code:java}
[ERROR] Tests run: 41, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 42.946
s <<< FAILURE! - in
org.apache.hadoop.hdfs.server.federation.router.TestRouterRpc
[ERROR]
testErasureCoding(org.apache.hadoop.hdfs.server.federation.router.TestRouterRpc)
Time elapsed: 0.836 s <<< ERROR!
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/testec/testfile2 could only be written to 5 of the 6 required nodes for
RS-6-3-1024k. There are 6 datanode(s) running and 6 node(s) are excluded in
this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2222)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2836)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
{code}
There's not enough log to figure out why the block allocation failed, but it
seems unrelated with this Jira. This Jira is try to fix the
{{testNamenodeMetrics}}, I'll try to re-trigger the build again.
{quote}Anyway is there any way to repro the failure? Well I will try locally to
run this couple of times and confirm too, provided {{TestRouterFaultTolerant}}
passes.
{quote}
There is another Jira HDFS-14742 tracks the failure reason of this UT,
[~elgoiri] has some analysis of the failure reason : "The problem with this
test is that there is a couple random variables that in some cases end up with
all the files in one subcluster.", so it looks not easy to repro the failure,
let's track it in HDFS-14742.
> RBF: TestRouterRpc tests are flaky
> ----------------------------------
>
> Key: HDFS-14654
> URL: https://issues.apache.org/jira/browse/HDFS-14654
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Takanobu Asanuma
> Assignee: Chen Zhang
> Priority: Major
> Attachments: HDFS-14654.001.patch, HDFS-14654.002.patch,
> HDFS-14654.003.patch, HDFS-14654.004.patch, error.log
>
>
> They sometimes pass and sometimes fail.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]