Tim Armstrong created HDFS-14479:
------------------------------------

             Summary: Closing of HDFS file handle can be quite slow if file was 
deleted under the client
                 Key: HDFS-14479
                 URL: https://issues.apache.org/jira/browse/HDFS-14479
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Tim Armstrong


In IMPALA-7176 we saw that hdfsClose() could sometimes take upwards of a second 
if the directory containing the file was deleted from underneath the writer. 
The error we get is like this:

{noformat}
Error(2): No such file or directory
Root cause: RemoteException: File does not exist: 
/test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3f4e3729014920fd_6348d0a700000000/.3f4e3729014920fd-6348d0a700000006_1116334753_dir/year=2009/month=90/3f4e3729014920fd-6348d0a700000006_1538863142_data.0.parq
 (inode 111416) [Lease.  Holder: DFSClient_NONMAPREDUCE_1345999117_1, pending 
creates: 237]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2782)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2661)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
{noformat}

A second doesn't sound so bad, but it really adds up if you have a significant 
number of files open (e.g. inserting into a partitioned Hive table) - just 
cleaning up the open file handles ties up a thread for a long time. It would be 
helpful for us if this was faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to