[ 
https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712650#action_12712650
 ] 

dhruba borthakur commented on HADOOP-2757:
------------------------------------------

> 1. rpc timeout: the patch seems to implement a read timeout not rpc timeout. 

As an administrator of a cluster, I find it easier to set a time limit for a 
rpc conection to bail out if it is not receiving response data continuously. I 
could change it to a true rpcTimeout, but RPCs like "dfsadmin -report" could 
truly take a long time because the amount of data to be transferred might be 
huge depending on the size of the cluster. I am comfortable configuring a 
cluster in such a way that if a rpc client is waiting for more data from the 
rpc server for more than 30 seconds, then the client can safely assume that the 
server is non-responsive. This works even for RPCs that have to transfer large 
amounts of data. Do you agree? 

> 2. if we have rpc timeout, why we still need soft mount timeout in 
> leaseChecker? 
I think we need these two things to be separate. Please see answer to 3a below. 

> 3 I think the check "if (now > last + softMountTimeout) " could easily be 
> true in normal cases if renewFrenquency is set to be the soft mount timeout. 
The code sets renewFrequency to be softMountTimeout/3. So, "if renewFrenquency 
is set to be the soft mount timeout" cannot happen. But I will modify this 
portion of code to handle this case better. 

> 3a. I feel that the meaning of soft mount timeout is not clear maybe 
The NFS manual says something like this : " The softmount timeout sets the time 
the NFS client will wait for a request to complete". 
To make things clearer, this patch keeps two configuration values: 
 ipc.client.inactivity.timeout: is the period of inactivity time when a client 
is waiting for a response". 
 dfs.softmount.timeout: the max time a DFSClient will wait for a request to 
successfully complete 
The ipc.client.inactivity.timeout is set for a single rpc call. The 
dfs.softmount.timeout applied to FileSystem operations like DFSClient.close(). 

> 4. In the file close case, would it be better just to limit the number of 
> retires? 
In fact, I first deployed a version of code in our cluster that specified the 
max number of retries to be 5. But then, when I was explaining this behaviour 
to an app-writer who is writing an app on top of hdfs, it was difficult for me 
to explain what it really means. I found it easier to explain that "this call 
will not take more than 30 seconds". Also, specifying a "time" is future proof 
in a sense that a hdfs developer can change the frequency of close-retries 
without affecting the semantics exposed to the user. If you feel strongly 
against this one, I can change it, please do let me know. 

thanks for reviewing this one. 

> Should DFS outputstream's close wait forever?
> ---------------------------------------------
>
>                 Key: HADOOP-2757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2757
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>         Attachments: softMount1.patch, softMount1.patch, softMount2.patch, 
> softMount3.patch
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps 
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty 
> annoying for a user. Shoud the loop inside close have a timeout? If so how 
> much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to