[
https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867212#comment-17867212
]
Ayush Saxena commented on HADOOP-19218:
---------------------------------------
[~hexiaoqiao] / [~shahrs87]
I doubt that this breaks: TestFSNamesystemLockReport (I tried reverting this
locally & the test passes)
[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1642/testReport/org.apache.hadoop.hdfs.server.namenode/TestFSNamesystemLockReport/test/]
can you folks help check. Haven't checked the code, but my guess is it may be
due to change in output of toString(), Was trying to investigate a bit, is this
patch a regression due HADOOP-18628, those toString() were modified last there.
Do give a check on the compat, if the test failure is due to this patch, unless
it bothers the audit log, it should be safe I believe
> Avoid DNS lookup while creating IPC Connection object
> -----------------------------------------------------
>
> Key: HADOOP-19218
> URL: https://issues.apache.org/jira/browse/HADOOP-19218
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Been running HADOOP-18628 in production for quite sometime, everything works
> fine as long as DNS servers in HA are available. Upgrading single NS server
> at a time is also a common case, not problematic. Every DNS lookup takes 1ms
> in general.
> However, recently we encountered a case where 2 out of 4 NS servers went down
> (temporarily but it's a rare case). With small duration DNS cache and 2s of
> NS fallback timeout configured in resolv.conf, now any client performing DNS
> lookup can encounter 4s+ delay. This caused namenode outage as listener
> thread is single threaded and it was not able to keep up with large num of
> unique clients (in direct proportion with num of DNS resolutions every few
> seconds) initiating connection on listener port.
> While having 2 out of 4 DNS servers offline is rare case and NS fallback
> settings could also be improved, it is important to note that we don't need
> to perform DNS resolution for every new connection if the intention is to
> improve the insights into VersionMistmatch errors thrown by the server.
> The proposal is the delay the DNS resolution until the server throws the
> error for incompatible header or version mismatch. This would also help with
> ~1ms extra time spent even for healthy DNS lookup.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]