[
https://issues.apache.org/jira/browse/HDFS-12142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087565#comment-16087565
]
Kihwal Lee commented on HDFS-12142:
-----------------------------------
The following appears after the files is successfully closed. It seems
DataStreamer is sometimes left running and the regular pipeline shutdown is
somehow recognized as a failure.
{noformat}
2017-07-10 20:19:11,870 [IPC Server handler 72 on 8020] INFO ipc.Server: IPC
Server handler 72 on 8020, call Call#99 Retry#0
org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from
x.x.x.x:50972
java.io.IOException: Unexpected BlockUCState: BP-yyy:blk_12300000_10000 is
COMPLETE but not UNDER_CONSTRUCTION
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:5509)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:5576)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:918)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline
(ClientNamenodeProtocolServerSideTranslatorPB.java:971)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:881)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:810)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2523)
{noformat}
The blocks are all finalized normally and had no data loss, but until we know
the actual cause of this, I can't be sure whether it will cause any data loss.
> Files may be closed before streamer is done
> -------------------------------------------
>
> Key: HDFS-12142
> URL: https://issues.apache.org/jira/browse/HDFS-12142
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 2.8.0
> Reporter: Daryn Sharp
>
> We're encountering multiple cases of clients calling updateBlockForPipeline
> on completed blocks. Initial analysis is the client closes a file,
> completeFile succeeds, then it immediately attempts recovery. The exception
> is swallowed on the client, only logged on the NN by checkUCBlock.
> The problem "appears" to be benign (no data loss) but it's unproven if the
> issue always occurs for successfully closed files. There appears to be very
> poor coordination between the dfs output stream's threads which leads to
> races that confuse the streamer thread – which probably should have been
> joined before returning from close.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]