[jira] [Commented] (HDFS-13757) After HDFS-12886, close() can throw AssertionError "Negative replicas!"

Wei-Chiu Chuang (JIRA) Mon, 23 Jul 2018 15:33:23 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553488#comment-16553488
 ]


Wei-Chiu Chuang commented on HDFS-13757:
----------------------------------------

It wouldn't fail for the "Negative Replica" before HDFS-13757.

> After HDFS-12886, close() can throw AssertionError "Negative replicas!"
> -----------------------------------------------------------------------
>
>                 Key: HDFS-13757
>                 URL: https://issues.apache.org/jira/browse/HDFS-13757
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13757.test.02.patch, HDFS-13757.test.patch
>
>
> While investigating a data corruption bug caused by concurrent recoverLease() 
> and close(), I found HDFS-12886 may cause close() to throw AssertionError 
> under a corner case, because the block has zero live replica, and client 
> calls recoverLease() immediately followed by close().
> {noformat}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Negative 
> replicas!
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.getPriority(LowRedundancyBlocks.java:197)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.update(LowRedundancyBlocks.java:422)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReconstructions(BlockManager.java:4274)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitOrCompleteLastBlock(BlockManager.java:1001)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3471)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:713)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2854)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I have a test case to reproduce it.
> [~lukmajercak] [~elgoiri] would you please take a look at it? I think we 
> should add a check to reject completeFile() if the block is under recovery, 
> similar to what's proposed in HDFS-10240.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13757) After HDFS-12886, close() can throw AssertionError "Negative replicas!"

Reply via email to