whutwhu opened a new pull request #7307: URL: https://github.com/apache/trafficserver/pull/7307
TCP_RETRY is the default mode for our DNS query, since we have some DNS responses and their size is larger than 512 bytes. For this type query, we try UDP query first and if it can't work (truncated flag is set in the HEADER) we switched to TCP query. We would find DNS spike problem for one corner case in TCP_RETRY setting: 1. The DNS query need run in TCP connection; 2. TCP connection is broken but UDP connection is fine; In this corner case, like if the QPS is like 2k~3k for the host, ATS continue to send UDP queries frequently but can't accept the responses, the situation would happen 1 minute and in this 1 minute, huge DNS UDP queries sent to DNS service (like 4k~5k QPS for the query of one URL), in result DNS spike and drop queries in the DNS service side. The problem would be fixed with this PR: 1. Set the threshold of the continuous TCP query failures for the TCP connection, and default value of the threshold is 10 which can be configurable. If the threshold is 0 (or less than 0) means we close this feature. 2. If the continuous TCP queries over the threshold, reset the TCP connection immediately. 3. Add two metrics in DNS to monitor the counter of TCP retires and TCP reset, as well as some Warning/Debug msgs to record. 4. Enhance open_con to return the status of connection setup. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
