[ 
https://issues.apache.org/jira/browse/HADOOP-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537104
 ] 

Konstantin Shvachko commented on HADOOP-2087:
---------------------------------------------

>From conversation with Hairong, DFSClient should be retrying automatically to 
>create a file after a 1 minute sleep
if it receives AlreadyBeingCreatedException. This is a part of the retry policy 
introduced by HADOOP-1263.

# So the first question to answer is whether the retry frameworks does what it 
is intended to do that is retry.
# If it does then we should look into why the first task (_0) does not shut 
down properly, because that would mean
the first task is still up and renewing the lease for the file.
# And the last question is whether DFSClient SHOULD do automatic retries on 
AlreadyBeingCreatedException.
I think hdfs create() should throw this exception to the application, because 
applications might want to create a different 
file or wait for the lease to expire depending on its internal logic. In case 
of TestDFSIO it should wait.
Throwing should be explicit, meaning that the create api should explicitly list 
AlreadyBeingCreatedException as one of 
the exceptions thrown by create(), rather than just generic IOException.

Whether its critical for 0.15 or not depends on whether we can run TestDFSIO 
benchmark. If the exception is thrown every
time we run it, then yes.

> Errors for subsequent requests for file creation after original DFSClient 
> goes down..
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2087
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Gautam Kowshik
>             Fix For: 0.15.0
>
>
> task task_200710200555_0005_m_000725_0 started writing a file and the Node 
> went down.. so all following file creation attempts were returned with 
> AlreadyBeingCreatedException
> I think the dfs should handle cases wherein, if a dfsclient goes down between 
> file creation, subsequent creates to the same file could be allowed. 
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
> from task_200710200555_0005_m_000725_0: Task 
> task_200710200555_0005_m_000725_0 failed to report status for 606 seconds. 
> Killing!
> 2007-10-20 06:23:51,189 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'task_200710200555_0005_m_000725_0' from 
> '[tracker_address]:/127.0.0.1:44198'
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobInProgress: Choosing 
> normal task tip_200710200555_0005_m_000725
> 2007-10-20 06:23:51,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
> 'task_200710200555_0005_m_000725_1' to tip tip_200710200555_0005_m_000725, 
> for tracker '[tracker_address]:/127.0.0.1:50914'
> 2007-10-20 06:28:54,991 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
> from task_200710200555_0005_m_000725_1: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /benchmarks/TestDFSIO/io_data/test_io_825 for 
> DFSClient_task_200710200555_0005_m_000725_1 on client 72.30.50.198, because 
> this file is already being created by 
> DFSClient_task_200710200555_0005_m_000725_0 on 72.30.53.224
>         at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:881)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to