Possible performance problem: DNS lookup continuation is using first Network
ethread for all operations
-------------------------------------------------------------------------------------------------------
Key: TS-307
URL: https://issues.apache.org/jira/browse/TS-307
Project: Traffic Server
Issue Type: Improvement
Reporter: Miles Libbey
Priority: Minor
(from yahoo bug 989959)
Original description
by Vladimir Legalov 3 years ago at 2006-12-18 11:57
All DNS lookup operations are executing on the first Network thread. Since each
Network thread is already responsible
for NetAccept & NetHandler continuation processing, DNS processing can cause
extra CPU usage and additional delays
for
this particular thread. It make sense to extract DNS processing as absolutely
independent thread (ethread) to avoid
possible performance problem related to
DNS lookups.
Such performance problem can be visible only in "no caching" mode with very
high rate of OS requests.
Additional performance testing is required to clarify visibility of this
problem.
(It looks like htop is not an appropriate tool to catch precise CPU usage per
thread.)
Comment 1
by Leif Hedstrom 3 years ago at 2006-12-26 13:41:10
I think it's highly unlikely that DNS will ever become a bottleneck. Even under
extreme cases, like say 300 Origin
Servers all with a TTL of 5 minutes (we rarely have anything shorter), we're
looking at one DNS lookup per second
(assuming there are no cache hits, as pointed out already).
I'm closing this bug until we have some real evidence that DNS lookups is ever
going to be any sort of bottleneck.
Comment 2
by Vladimir Legalov 3 years ago at 2006-12-26 20:31:17
I don't understand why we should not keep this RFE open. I would prefer to keep
DNS lookup code as separate thread not
because of a huge performance impact but because the DNS lookup continuation is
activated every 11 milliseconds (just
to verify the status of the 32 UDP sockets) even if we don't need to do perform
a DNS lookup. One more thing - this
continuation is impacting eThread scheduling for first NetHandler continuation.
I am 100% sure that all NetHandler continuations must be symmetrical/equal and
have similar scheduling. I would prefer
to reopen this RFE.
Comment 3
by Ryan Troll 3 years ago at 2006-12-27 06:47:47
Reopened, with *very low* priority.
I'd recommend waiting until the bigger items are done before tackling this.
Yes, we may be spending time in DNS in
this thread when we don't need to; and maybe a single DNS thread is the right
answer. Or maybe modifying the DNS code
to not bother with DNS continuations unless there are outstanding DNS requests
makes more sense.
However, I'd wait on this until we have time to go back and tune it. It may
squeeze a little more performance out of
the stack, but I suspect there are bigger wins to be gained through
enhancements that are being actively requested by
properties; or through enhancements we've already identified.
It makes sense to keep this open so we don't forget about it. Hopefully we'll
get to it later this year.
Comment 4
by Leif Hedstrom 3 years ago at 2006-12-27 07:42:47
The reason I closed this bug was that the bug report indicated that this would
be a problem under heavy load, with no
caching. I don't believe that to be the case. In best case DNS lookups will be
of O(1) complexity, and worst case it'd
be O(n), where n is the number of origin servers. In either of those case,
performing the actualy DNS lookups will be
negligible as far as CPU consumption is concerned.
However, with the comment from Vlad, it seems the concern is about wasting time
on the DNS continuation, which I agree
might be worth investigating. But I'd also like to see some benchmarks on how
much this does affect us today. I'm not
sure exactly how to test this. Vlad, is it possible to increase the timer for
the DNS continuation to get scheduled,
e.g. have it run every 1 second? Then we could easily benchmark what effect
that has on performance.
Comment 5
by Vladimir Legalov 3 years ago at 2006-12-27 19:09:23
The existence of this RFE does not mean that it will be taken on our
development table immediately. It is a reminder
only.
As I already mentioned in the initial comments for this RFE: "Additional
performance testing is required to
clarify visibility of this problem."
We have plenty of similar RFE's by priority and severity, which are not in
active development. I was sure that P4 is
clear evidence of such 'dormant' status.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.