[ 
https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545750#comment-16545750
 ] 

Chao Sun commented on HDFS-13735:
---------------------------------

[~xkrogen]: will add this to {{hdfs-default.xml}}. I'd like to decrease the 
timeout so that it can fail quickly. Currently if the timeout occur, the 
{{EditLogTailer}} will 
[retry|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java#L468]
 in the next iteration. This will be a minor issue for SBN but not for 
ObserverNameNode, for two reasons:

1. The HTTP connection is opened while holding the NN read/write lock (seeĀ 
[readOp() inside 
FSEditLogLoader|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java#L213]
 and how it calls 
[EditLogFileInputStream#init()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java#L134]
 to open HTTP connection with JNs.), so huge RPC spike could occur with the 
timeout.
2. The namespace freshness of ObserverNameNode w.r.t Active NN will be as stale 
as 60s because of the long timeout. This may not be acceptable for some 
scenarios.

Since this is mainly an issue for ObserverNameNode, please also let me know 
whether it makes more sense to move this JIRA under HDFS-12943. Thanks.

> Make QJM HTTP URL connection timeout configurable
> -------------------------------------------------
>
>                 Key: HDFS-13735
>                 URL: https://issues.apache.org/jira/browse/HDFS-13735
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: qjm
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Minor
>         Attachments: HDFS-13735.000.patch
>
>
> We've seen "connect timed out" happen internally when QJM tries to open HTTP 
> connections to JNs. This is now using {{newDefaultURLConnectionFactory}} 
> which uses the default timeout 60s, and is not configurable.
> It would be better for this to be configurable, especially for 
> ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be 
> a good value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to