I don't think TCP offload is an issue here. I finally found some time, and am doing something I've not done before.
I put wireshark on a spare XP machine, and turned on tcpdump for each NIC on our firewall, and fired up a web page that I thought would be slow - www.bostonglobe.com - because nobody was likely to have hit it that day. Initial investigation of just the workstation capture shows that it takes almost a full minute for all of the DNS calls to get answered, though http requests start in the middle of that, and finish within milliseconds of the DNS requests finishing. The odd thing is that wireshark shows that that initial name requests are for AAAA records, and I also noticed some DHCPv6 requests in the trace. So, I checked, and yes, IPv6 had been installed on the machine. More work to be done on this, but I have to wonder of the slowness isn't somehow related to the IPv6 requests not being satisfied. My other hypothesis is that our DCs are overloaded, and not responding to name requests in a timely fashion,. I'm going to use mergecap and editcap to produce a slimmed down and unified trace, so that I can see everything relevant in one trace. That should prove quite interesting. Kurt On Fri, Jun 17, 2011 at 06:13, Jeff Bunting <[email protected]> wrote: > Have you looked at the NIC offload settings? I recently had a similar > problem on my home machine. I was getting a lot of intermittent DNS > failures though TCP connectivity (once established) seemed OK. I had > switched among the ISP DNS, Google DNS, and OpenDNS without any better > results, so figured it was something with my new equipment - I had recently > replaced the NIC, switch, and router because of lightning, so where to look > wasn't immediately evident. > I eventually fired up wireshark and saw that header checksums were often > showing up as zeroes. After disabling offloading, everything was OK again. > > Jeff > > > On Mon, Jun 13, 2011 at 12:51 AM, Kurt Buff <[email protected]> wrote: >> >> Pardon the lateness of this reply (I've been out of town for a couple >> of days), but no, the lack of consistency leaves me with questions, >> not answers - because I have only one instance of inside being slower >> than outside. Let me be a bit more specific: >> >> Date Machine Elapsed >> 2011-05-03 Ext 40m >> 2011-05-03 Int1 40m >> >> 2011-05-06 Ext 2h 46m >> 2011-05-06 Int1 2h 46m >> 2011-05-06 Int2 2h 46m >> >> 2011-05-08 Ext 40m >> 2011-05-06 Int1 2h 46m >> 2011-05-06 Int2 DNF >> >> So, before I left work on Wednesday, I scheduled this same task for >> 12:30 daily on each of the machines. I'll be using the new data for a >> deeper analysis. >> >> Unfortunately, my manager just emailed me that he's "tweaked our DNS >> setup" while I was out - who knows how, or how that affected things. >> >> Sigh. >> >> Kurt >> >> >> On Thu, Jun 9, 2011 at 03:25, Andrew S. Baker <[email protected]> wrote: >> > Doesn't the fact that there is no consistency in the data from systems >> > placed at different points in your network *helpful* to determining >> > where >> > there is a potential slowdown? >> > >> > Users complain about slow performance, and your logging shows that >> > speeds >> > outside are faster than those inside. >> > >> > This would indicate that something on the inside is a bottleneck at that >> > time... >> > >> > It would seem to me that you have more than enough data to drill down >> > and >> > find out where the issues are taking place. >> > >> > ASB (Professional Bio) >> > Harnessing the Advantages of Technology for the SMB market... >> > >> > >> > >> > >> > On Thu, Jun 9, 2011 at 1:12 AM, Kurt Buff <[email protected]> wrote: >> >> >> >> All, >> >> >> >> I'm in need of a new approach to troubleshooting staff complaints >> >> about intermittent slowness of web browsing. We have about 200 staff >> >> members on site, the symptoms are intermittent at best, but include >> >> some generalized slowness in page loads, and occasional complete page >> >> misses - that is, staff report that a page fails to load at all, with >> >> a message that the system can't find the page, but hitting refresh >> >> will usually bring the page right up. >> >> >> >> My current testing methodology seems to be getting me nowhere and >> >> causing me to lose hair in great chunks. I outline the methodology >> >> below because someone might spot a flaw in it. >> >> >> >> I'm not well versed in reading packets, so haven't yet resorted to >> >> wireshark or tcpdump, but my testing so far leads me to believe that I >> >> won't find much that way. If your reading of the situation leads you >> >> to believe otherwise, I'm all ears. But I'm also really interested in >> >> hearing other things all y'all might suggest on how to go about this. >> >> >> >> Network physical configuration: >> >> DS3 >> HP 2524 switch >> Sidewinder firewall >> HP 2524 switch >> >> >> Barracuda web filter >> HP 3400cl switch >> production VLANs >> >> >> >> Network logical configuration: >> >> No VLANs externally, 9 VLANs that run over the 3400cl and 18 >> >> VLANs (the ones on the 3400cl, plus 9 for test/dev/other) that run on >> >> the internal HP 2524. The firewall is a HA pair (active/passive) and >> >> has a VLANed interface to the HP 2524 - it sees all of the VLANs. >> >> >> >> Other data: >> >> I've got ntop running on two different points on the network - >> >> the external HP 2524, and the HP 3400cl - no load anomalies for the >> >> LAN or Internet connection noted. >> >> >> >> Testing methodology: >> >> I have placed a FreeBSD box with a public IP address external to >> >> the firewall, and two FreeBSD boxes internal to the firewall on >> >> different VLANs. One of the internal FreeBSD boxes is on a VLAN that >> >> doesn't traverse the 3400cl, and the other is placed in a VLAN that >> >> does - both VLANs transit the Barracuda, as do all staff machines. >> >> Each box has cURL installed (there's a version for Windows as well), >> >> and is given an identical list of about 2100 unique (http://fqdn only >> >> - not http://fqdn/somepath) URLs to resolve and download. I kick off >> >> the batch files manually - and simultaneously. >> >> The batch file is simple: >> >> date > /root/out.txt >> >> /usr/local/bin/curl -K /root/urls.txt >> /root/out.txt >> >> date >> /root/out.txt >> >> The entries are all formatted similarly, e.g.: >> >> url = "http://www.google.com" >> >> -s >> >> -w = "%{url_effective}\t%{time_total}\t%{time_namelookup}\n" >> >> -o = /dev/null >> >> The output looks like this: >> >> http://www.google.com 0.093 0.066 >> >> Downloaded data is dumped to /dev/null, but I capture the timings >> >> for name resolution and the total transaction so that if I want I can >> >> analyze them later. I used this method before to identify a problem >> >> with the DNS proxy on the firewall, so thought this would be a useful >> >> method to do the same thing. >> >> All three boxes are using Google for name resolution: 8.8.8.8 - >> >> so that I can eliminate variances based on possible problems with our >> >> AD DNS infrastructure - I don't think there are any, but.... >> >> Currently, our AD DNS points to 8.8.8.8 for its resolvers, but >> >> was originally pointed at our ISPs DNS - that change doesn't seem to >> >> have made a difference in staff experience. >> >> I gathered the URLs from my syslogs, so they are real sites that >> >> people here visit. >> >> >> >> The problem with the results from the methodology: >> >> Using the same data files each time, timings across all three >> >> boxes have varied wildly. On Friday of last week, each of the three >> >> boxes took 40 minutes to run through the list of URLs. On Tuesday they >> >> each took roughly three hours. Today the external box took 40 minutes >> >> and one of the internal boxes took about 3 hours, and the other >> >> internal machine hadn't finished by the time I left work - cURL hung >> >> on that machine and I'm going to rebuild it, as it had been mothballed >> >> and only revived for this test, and really needs updating. Because >> >> there is no consistency in the data, I cannot draw any conclusions. >> >> I'm going to try a few more runs, but definitely feel the need for a >> >> different approach >> >> >> >> Any thoughts you might have will be appreciated. I'm out for the next >> >> couple of days, so won't be able to try any suggestions until next >> >> week, but would love to hear from folks on this. >> >> >> >> Thanks, >> >> >> >> Kurt >> >> >> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >> > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >> > >> > --- >> > To manage subscriptions click here: >> > http://lyris.sunbelt-software.com/read/my_forums/ >> > or send an email to [email protected] >> > with the body: unsubscribe ntsysadmin >> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >> >> --- >> To manage subscriptions click here: >> http://lyris.sunbelt-software.com/read/my_forums/ >> or send an email to [email protected] >> with the body: unsubscribe ntsysadmin >> > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to [email protected] > with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin
