Viraj Jasani created HADOOP-19218:
-------------------------------------

             Summary: Avoid DNS lookup while creating IPC Connection object
                 Key: HADOOP-19218
                 URL: https://issues.apache.org/jira/browse/HADOOP-19218
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Viraj Jasani


Been running HADOOP-18628 in production for quite sometime, everything works 
fine as long as DNS servers in HA are available. Upgrading single NS server at 
a time is also a common case, not problematic.

However, recently we encountered a case where 2 out of 4 NS servers went down 
(temporarily but it's a rare case). With small duration DNS cache and 2s of NS 
fallback timeout configured in resolv.conf, now any client performing DNS 
lookup can encounter 4s+ delay. This caused namenode outage as listener thread 
is single threaded and it was not able to keep up with large num of unique 
clients (in direct proportion with num of DNS resolutions every few seconds) 
initiating connection on listener port.

While having 2 out of 4 DNS servers offline is rare case and NS fallback 
settings could also be improved, it is important to note that we don't need to 
perform DNS resolution for every new connection if the intention is to improve 
the insights into VersionMistmatch errors thrown by the server.

The proposal is the delay the DNS resolution until the server throws the error 
for incompatible header or version mismatch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to