Tim Armstrong created HDFS-14479: ------------------------------------ Summary: Closing of HDFS file handle can be quite slow if file was deleted under the client Key: HDFS-14479 URL: https://issues.apache.org/jira/browse/HDFS-14479 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tim Armstrong
In IMPALA-7176 we saw that hdfsClose() could sometimes take upwards of a second if the directory containing the file was deleted from underneath the writer. The error we get is like this: {noformat} Error(2): No such file or directory Root cause: RemoteException: File does not exist: /test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3f4e3729014920fd_6348d0a700000000/.3f4e3729014920fd-6348d0a700000006_1116334753_dir/year=2009/month=90/3f4e3729014920fd-6348d0a700000006_1538863142_data.0.parq (inode 111416) [Lease. Holder: DFSClient_NONMAPREDUCE_1345999117_1, pending creates: 237] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2782) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2661) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) {noformat} A second doesn't sound so bad, but it really adds up if you have a significant number of files open (e.g. inserting into a partitioned Hive table) - just cleaning up the open file handles ties up a thread for a long time. It would be helpful for us if this was faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org