[
https://issues.apache.org/jira/browse/HDFS-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025029#comment-16025029
]
Ravi Prakash commented on HDFS-11852:
-------------------------------------
Thank you for the pointer Kihwal!
In our case, the only replica available was on the decomissioning node. I'm
guessing one of the other datanodes may have been decommissioned successfully
and a second failed perhaps. In that case, HDFS-11499 will likely not recover
the under-replicated block. However it would reduce the likelihood of reaching
that state, so I agree with closing this JIRA as a duplicate of HDFS-11499.
> Under-repicated block never completes because of failure in
> commitBlockSynchronization()
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-11852
> URL: https://issues.apache.org/jira/browse/HDFS-11852
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.3
> Reporter: Ravi Prakash
>
> Credit goes to Charles Wimmer and Karthik Kumar for pointing me to this issue.
> We noticed a block is holding up decommissioning because recovery failed.
> (The stack trace below is from the time when the cluster was 2.7.2) . DN2 and
> DN3 are no longer part of the cluster. DN1 is the node held up for
> decomissioning. I checked that the block and meta file indeed are in the
> finalized directory.
> {code}2016-09-19 09:02:25,837 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks FAILED:
> RecoveringBlock{BP-<someid>:blk_1094097355_20357090; getBlockSize()=0;
> corrupt=false; offset=-1;
> locs=[DatanodeInfoWithStorage[<DN1>:50010,null,null],
> DatanodeInfoWithStorage[<DN2>:50010,null,null],
> DatanodeInfoWithStorage[<DN3>:50010,null,null]]}
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException):
> Failed to finalize INodeFile <filename> since blocks[0] is non-complete,
> where blocks=[blk_1094097355_20552508{UCState=COMMITTED, truncateBlock=null,
> primaryNodeIndex=0,
> replicas=[ReplicaUC[[DISK]DS-03bed13e-5cdd-4207-91b6-abd83f9eb7d3:NORMAL:<DN1>:50010|RBW]]}].
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:172)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:222)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.toCompleteFile(INodeFile.java:209)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.finalizeINodeFileUnderConstruction(FSNamesystem.java:4218)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4457)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4419)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:837)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:291)
> at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28768)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy16.commitBlockSynchronization(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolClientSideTranslatorPB.java:312)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2780)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2642)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:243)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2519)
> at java.lang.Thread.run(Thread.java:744){code}
> On the namenode side
> {code}
> 2016-09-19 09:02:25,835 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(oldBlock=BP-<someid>:blk_1094097355_20357090,
> newgenerationstamp=20552508, newlength=18642324, newtargets=[<DN1>:50010],
> closeFile=true, deleteBlock=false){code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]