Tim Armstrong created HDFS-14479:
------------------------------------
Summary: Closing of HDFS file handle can be quite slow if file was
deleted under the client
Key: HDFS-14479
URL: https://issues.apache.org/jira/browse/HDFS-14479
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Tim Armstrong
In IMPALA-7176 we saw that hdfsClose() could sometimes take upwards of a second
if the directory containing the file was deleted from underneath the writer.
The error we get is like this:
{noformat}
Error(2): No such file or directory
Root cause: RemoteException: File does not exist:
/test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3f4e3729014920fd_6348d0a700000000/.3f4e3729014920fd-6348d0a700000006_1116334753_dir/year=2009/month=90/3f4e3729014920fd-6348d0a700000006_1538863142_data.0.parq
(inode 111416) [Lease. Holder: DFSClient_NONMAPREDUCE_1345999117_1, pending
creates: 237]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2782)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2661)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
{noformat}
A second doesn't sound so bad, but it really adds up if you have a significant
number of files open (e.g. inserting into a partitioned Hive table) - just
cleaning up the open file handles ties up a thread for a long time. It would be
helpful for us if this was faster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]