[
https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712650#action_12712650
]
dhruba borthakur commented on HADOOP-2757:
------------------------------------------
> 1. rpc timeout: the patch seems to implement a read timeout not rpc timeout.
As an administrator of a cluster, I find it easier to set a time limit for a
rpc conection to bail out if it is not receiving response data continuously. I
could change it to a true rpcTimeout, but RPCs like "dfsadmin -report" could
truly take a long time because the amount of data to be transferred might be
huge depending on the size of the cluster. I am comfortable configuring a
cluster in such a way that if a rpc client is waiting for more data from the
rpc server for more than 30 seconds, then the client can safely assume that the
server is non-responsive. This works even for RPCs that have to transfer large
amounts of data. Do you agree?
> 2. if we have rpc timeout, why we still need soft mount timeout in
> leaseChecker?
I think we need these two things to be separate. Please see answer to 3a below.
> 3 I think the check "if (now > last + softMountTimeout) " could easily be
> true in normal cases if renewFrenquency is set to be the soft mount timeout.
The code sets renewFrequency to be softMountTimeout/3. So, "if renewFrenquency
is set to be the soft mount timeout" cannot happen. But I will modify this
portion of code to handle this case better.
> 3a. I feel that the meaning of soft mount timeout is not clear maybe
The NFS manual says something like this : " The softmount timeout sets the time
the NFS client will wait for a request to complete".
To make things clearer, this patch keeps two configuration values:
ipc.client.inactivity.timeout: is the period of inactivity time when a client
is waiting for a response".
dfs.softmount.timeout: the max time a DFSClient will wait for a request to
successfully complete
The ipc.client.inactivity.timeout is set for a single rpc call. The
dfs.softmount.timeout applied to FileSystem operations like DFSClient.close().
> 4. In the file close case, would it be better just to limit the number of
> retires?
In fact, I first deployed a version of code in our cluster that specified the
max number of retries to be 5. But then, when I was explaining this behaviour
to an app-writer who is writing an app on top of hdfs, it was difficult for me
to explain what it really means. I found it easier to explain that "this call
will not take more than 30 seconds". Also, specifying a "time" is future proof
in a sense that a hdfs developer can change the frequency of close-retries
without affecting the semantics exposed to the user. If you feel strongly
against this one, I can change it, please do let me know.
thanks for reviewing this one.
> Should DFS outputstream's close wait forever?
> ---------------------------------------------
>
> Key: HADOOP-2757
> URL: https://issues.apache.org/jira/browse/HADOOP-2757
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: dhruba borthakur
> Attachments: softMount1.patch, softMount1.patch, softMount2.patch,
> softMount3.patch
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps
> throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty
> annoying for a user. Shoud the loop inside close have a timeout? If so how
> much? It could probably something like 10 minutes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.