[ https://issues.apache.org/jira/browse/HADOOP-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636167#action_12636167 ]
Christian Kunz commented on HADOOP-4318: ---------------------------------------- I would be surprised if this is the same issue. Part of this issue seems to be that distcp tries to go around the AlreadyBeingCreatedException by deleting the file, but it uses the *wong path*. >From what I can see a distcp task starts copying into a destination file, >fails because of some issue, and the next retries cannot create the >destination file because there is still a lease on it. And attempts to delete >the file do not succeed because they use the *wrong path* /user/.../3164. Here is the corresponding log of the namenode: 2008-09-30 22:54:49,429 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file . 2008-09-30 22:54:49,429 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 8600, call .../_distcp_tmp_dvml74/3169, rw-r--r--, DFSClient_task_200809121811_0034_m_001085_1, true, 3, 134217728) from xxx.yyy.zzz.uuu:36614: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate file. org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file ..._distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 on client xxx.yyy.zzz.uuu because current lease holder is trying to recreate file. at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010) at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2008-09-30 22:54:49,431 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove */user/.../3169* because it does not exist. > distcp fails > ------------ > > Key: HADOOP-4318 > URL: https://issues.apache.org/jira/browse/HADOOP-4318 > Project: Hadoop Core > Issue Type: Bug > Affects Versions: 0.17.2 > Reporter: Christian Kunz > Priority: Blocker > Fix For: 0.17.3 > > > we run distcp between two clusters running 0.17.2 using hdfs. > As long as one of the tasks fails after opening a file for writing (which > typically always happens), subsequent retries will always fail with the > following exception (we did not see this with 0.16.3, seems to be a > regression): > 2008-09-30 22:54:49,430 INFO org.apache.hadoop.util.CopyFiles: FAIL 3169 : > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > xxx/_distcp_tmp_dvml74/3169 for DFSClient_task_200809121811_0034_m_001085_1 > on client xxx.yyy.zzz.uuu because current leaseholder is trying to recreate > file. > at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:1010) > at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:967) > at org.apache.hadoop.dfs.NameNode.create(NameNode.java:269) > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) > at org.apache.hadoop.ipc.Client.call(Client.java:557) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) > at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2192) > at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:479) > at > org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:138) > at > org.apache.hadoop.util.CopyFiles$CopyFilesMapper.create(CopyFiles.java:317) > at > org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:369) > at > org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493) > at > org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.