[
https://issues.apache.org/jira/browse/HDFS-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shengjinmiao updated HDFS-6592:
-------------------------------
Summary: Use Fluentd to collect data to append to HDFS. Throw the
AlreadyBeingCreatedException exception (was: Use Fluent to collect data to
append to HDFS. Throw the AlreadyBeingCreatedException exception)
> Use Fluentd to collect data to append to HDFS. Throw the
> AlreadyBeingCreatedException exception
> -----------------------------------------------------------------------------------------------
>
> Key: HDFS-6592
> URL: https://issues.apache.org/jira/browse/HDFS-6592
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: jack
>
> We use Fluent to collect log data. The log data append to the files in HDFS.
> The cluster configuration:
> Namenode : namenode1(hostname)
> secondnamenode: namenode2
> 3 datanodes: datanode1, datanode2, datanode3
> 3 replications
> Every few days, suffere from the following exception:
> Exception in nameNode1:
> 2014-06-22 09:54:41,892 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.append: Failed to create file [file_nameXXXXX] for
> [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because
> this file is already being created by
> [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
> 2014-06-22 09:54:41,892 WARN org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:hadoop (auth:SIMPLE)
> cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to
> create file [file_nameXXXXX] for [DFSClient_NONMAPREDUCE_-1425263782_2027206]
> on client [dataNode1], because this file is already being created by
> [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
> Exception in DataNode1:
> 2014-06-22 09:54:45,771 WARN org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException:
> Unable to close file because the last block does not have enough number of
> replicas.
> 2014-06-22 09:54:45,813 WARN org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:hadoop (auth:SIMPLE)
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> Failed to create file [file_nameXXXXX] for
> [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client [dataNode1], because
> this file is already being created by
> [DFSClient_NONMAPREDUCE_349196146_2027206] on [dataNode1]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2441)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> According to the log, we infer the flow of the exception:
> 1. Namenode update pipeline with just one datanode
> namenode1 log: 2014-06-22 09:54:16,604 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline
> (block=BP-1611177164-datanode1-1399894698024:blk_1074496235_1935947,
> newGenerationStamp=1935951, newLength=98839816, newNodes=[datanode1:50010],
> clientName=DFSClient_NONMAPREDUCE_349196146_2027206)
> 2. datanode1 throw exception during close.
> datanode1 log: 2014-06-22 09:54:26,569 INFO
> org.apache.hadoop.hdfs.DFSClient: Could not complete file_name retrying...
> 3. The subsequent collected data from Fluent will triger another DFSClient
> to append to the same file.
> namenode1 log: 2014-06-22 09:54:41,892 WARN
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE)
> cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
> Failed to create file [file_name] for
> [DFSClient_NONMAPREDUCE_-1425263782_2027206] on client
> [datanode1], because this file is already being created by
> [DFSClient_NONMAPREDUCE_349196146_2027206] on [datanode1]
> 4. The subsequent DFSClient will triger to recover the Lease every
> LEASE_SOFTLIMIT_PERIOD
>
> namenode1 log: 2014-06-22 09:58:34,722 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover
> [Lease. Holder:
> DFSClient_NONMAPREDUCE_349196146_2027206, pendingcreates: 1],
> src=file_name client DFSClient_NONMAPREDUCE_349196146_2027206
> 5. Fail to recover the lease.
> namenode1 log:
> 2014-06-22 09:58:34,722 WARN
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE)
> cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
> DIR* NameSystem.internalReleaseLease: Failed to release lease for file
> file_name. Committed
> blocks are waiting to be minimally replicated. Try again later.
> datanode1 log:
> 2014-06-22 09:56:15,149 WARN
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE)
>
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> DIR* NameSystem.internalReleaseLease: Failed to release
> lease for file file_name. Committed blocks are waiting to be minimally
> replicated. Try again later.
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3722)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2429)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2277)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2505)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2468)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:516)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:340)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> 6. epire the LEASE_HARDLIMIT_PERIOD
> namenode1 log: WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.append: failed to create file file_name for DFSClient on client
> datanode1 because pendingCreates is non-null but no leases found.
> we found HDFS-4504 may be associated with our issue. However, need to check.
--
This message was sent by Atlassian JIRA
(v6.2#6252)