Possible performance problem: DNS lookup continuation is using first Network 
ethread for all operations
-------------------------------------------------------------------------------------------------------

                 Key: TS-307
                 URL: https://issues.apache.org/jira/browse/TS-307
             Project: Traffic Server
          Issue Type: Improvement
            Reporter: Miles Libbey
            Priority: Minor


(from yahoo bug 989959)


Original description
by Vladimir Legalov  3 years ago at 2006-12-18 11:57

All DNS lookup operations are executing on the first Network thread. Since each 
Network thread is already responsible
for NetAccept & NetHandler continuation processing, DNS processing can cause 
extra CPU usage and additional delays
for
this particular thread. It make sense to extract DNS processing as absolutely 
independent thread (ethread) to avoid
possible performance problem related to
DNS lookups. 
Such performance problem can be visible only in "no caching" mode with very 
high rate of OS requests.
Additional performance testing is required to clarify visibility of this 
problem.
(It looks like htop is not an appropriate tool to catch precise CPU usage per 
thread.)

                

 
Comment 1
 by Leif Hedstrom  3 years ago at 2006-12-26 13:41:10

I think it's highly unlikely that DNS will ever become a bottleneck. Even under 
extreme cases, like say 300 Origin
Servers all with a TTL of 5 minutes (we rarely have anything shorter), we're 
looking at one DNS lookup per second
(assuming there are no cache hits, as pointed out already).

I'm closing this bug until we have some real evidence that DNS lookups is ever 
going to be any sort of bottleneck.

                

Comment 2
 by Vladimir Legalov  3 years ago at 2006-12-26 20:31:17

I don't understand why we should not keep this RFE open. I would prefer to keep 
DNS lookup code as separate thread not
because of a huge performance impact but because the DNS lookup continuation is 
activated every 11 milliseconds (just
to verify the status of the 32 UDP sockets) even if we don't need to do perform 
a DNS lookup. One more thing - this
continuation is impacting eThread scheduling for first NetHandler continuation.
I am 100% sure that all NetHandler continuations must be symmetrical/equal and 
have similar scheduling. I would prefer
to reopen this RFE.

                

 
Comment 3
 by Ryan Troll 3 years ago at 2006-12-27 06:47:47

Reopened, with *very low* priority.

I'd recommend waiting until the bigger items are done before tackling this.  
Yes, we may be spending time in DNS in
this thread when we don't need to; and maybe a single DNS thread is the right 
answer.  Or maybe modifying the DNS code
to not bother with DNS continuations unless there are outstanding DNS requests 
makes more sense.

However, I'd wait on this until we have time to go back and tune it.  It may 
squeeze a little more performance out of
the stack, but I suspect there are bigger wins to be gained through 
enhancements that are being actively requested by
properties; or through enhancements we've already identified.

It makes sense to keep this open so we don't forget about it.  Hopefully we'll 
get to it later this year.

                
Comment 4
 by Leif Hedstrom  3 years ago at 2006-12-27 07:42:47

The reason I closed this bug was that the bug report indicated that this would 
be a problem under heavy load, with no
caching. I don't believe that to be the case. In best case DNS lookups will be 
of O(1) complexity, and worst case it'd
be O(n), where n is the number of origin servers. In either of those case, 
performing the actualy DNS lookups will be
negligible as far as CPU consumption is concerned.

However, with the comment from Vlad, it seems the concern is about wasting time 
on the DNS continuation, which I agree
might be worth investigating. But I'd also like to see some benchmarks on how 
much this does affect us today. I'm not
sure exactly how to test this. Vlad, is it possible to increase the timer for 
the DNS continuation to get scheduled,
e.g. have it run every 1 second? Then we could easily benchmark what effect 
that has on performance.

                

 
Comment 5
 by Vladimir Legalov  3 years ago at 2006-12-27 19:09:23

The existence of this RFE does not mean that it will be taken on our 
development table immediately. It is a reminder
only.
As I already mentioned in the initial comments for this RFE: "Additional 
performance testing is required to
clarify visibility of this problem."
We have plenty of similar  RFE's by priority and severity, which are not in 
active development. I was sure that P4 is
clear evidence of such 'dormant' status.

                

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to