Have you looked at the NIC offload settings? I recently had a similar problem on my home machine. I was getting a lot of intermittent DNS failures though TCP connectivity (once established) seemed OK. I had switched among the ISP DNS, Google DNS, and OpenDNS without any better results, so figured it was something with my new equipment - I had recently replaced the NIC, switch, and router because of lightning, so where to look wasn't immediately evident.
I eventually fired up wireshark and saw that header checksums were often showing up as zeroes. After disabling offloading, everything was OK again. Jeff On Mon, Jun 13, 2011 at 12:51 AM, Kurt Buff <[email protected]> wrote: > Pardon the lateness of this reply (I've been out of town for a couple > of days), but no, the lack of consistency leaves me with questions, > not answers - because I have only one instance of inside being slower > than outside. Let me be a bit more specific: > > Date Machine Elapsed > 2011-05-03 Ext 40m > 2011-05-03 Int1 40m > > 2011-05-06 Ext 2h 46m > 2011-05-06 Int1 2h 46m > 2011-05-06 Int2 2h 46m > > 2011-05-08 Ext 40m > 2011-05-06 Int1 2h 46m > 2011-05-06 Int2 DNF > > So, before I left work on Wednesday, I scheduled this same task for > 12:30 daily on each of the machines. I'll be using the new data for a > deeper analysis. > > Unfortunately, my manager just emailed me that he's "tweaked our DNS > setup" while I was out - who knows how, or how that affected things. > > Sigh. > > Kurt > > > On Thu, Jun 9, 2011 at 03:25, Andrew S. Baker <[email protected]> wrote: > > Doesn't the fact that there is no consistency in the data from systems > > placed at different points in your network *helpful* to determining where > > there is a potential slowdown? > > > > Users complain about slow performance, and your logging shows that speeds > > outside are faster than those inside. > > > > This would indicate that something on the inside is a bottleneck at that > > time... > > > > It would seem to me that you have more than enough data to drill down and > > find out where the issues are taking place. > > > > ASB (Professional Bio) > > Harnessing the Advantages of Technology for the SMB market... > > > > > > > > > > On Thu, Jun 9, 2011 at 1:12 AM, Kurt Buff <[email protected]> wrote: > >> > >> All, > >> > >> I'm in need of a new approach to troubleshooting staff complaints > >> about intermittent slowness of web browsing. We have about 200 staff > >> members on site, the symptoms are intermittent at best, but include > >> some generalized slowness in page loads, and occasional complete page > >> misses - that is, staff report that a page fails to load at all, with > >> a message that the system can't find the page, but hitting refresh > >> will usually bring the page right up. > >> > >> My current testing methodology seems to be getting me nowhere and > >> causing me to lose hair in great chunks. I outline the methodology > >> below because someone might spot a flaw in it. > >> > >> I'm not well versed in reading packets, so haven't yet resorted to > >> wireshark or tcpdump, but my testing so far leads me to believe that I > >> won't find much that way. If your reading of the situation leads you > >> to believe otherwise, I'm all ears. But I'm also really interested in > >> hearing other things all y'all might suggest on how to go about this. > >> > >> Network physical configuration: > >> DS3 >> HP 2524 switch >> Sidewinder firewall >> HP 2524 switch >> > >> Barracuda web filter >> HP 3400cl switch >> production VLANs > >> > >> Network logical configuration: > >> No VLANs externally, 9 VLANs that run over the 3400cl and 18 > >> VLANs (the ones on the 3400cl, plus 9 for test/dev/other) that run on > >> the internal HP 2524. The firewall is a HA pair (active/passive) and > >> has a VLANed interface to the HP 2524 - it sees all of the VLANs. > >> > >> Other data: > >> I've got ntop running on two different points on the network - > >> the external HP 2524, and the HP 3400cl - no load anomalies for the > >> LAN or Internet connection noted. > >> > >> Testing methodology: > >> I have placed a FreeBSD box with a public IP address external to > >> the firewall, and two FreeBSD boxes internal to the firewall on > >> different VLANs. One of the internal FreeBSD boxes is on a VLAN that > >> doesn't traverse the 3400cl, and the other is placed in a VLAN that > >> does - both VLANs transit the Barracuda, as do all staff machines. > >> Each box has cURL installed (there's a version for Windows as well), > >> and is given an identical list of about 2100 unique (http://fqdn only > >> - not http://fqdn/somepath) URLs to resolve and download. I kick off > >> the batch files manually - and simultaneously. > >> The batch file is simple: > >> date > /root/out.txt > >> /usr/local/bin/curl -K /root/urls.txt >> /root/out.txt > >> date >> /root/out.txt > >> The entries are all formatted similarly, e.g.: > >> url = "http://www.google.com" > >> -s > >> -w = "%{url_effective}\t%{time_total}\t%{time_namelookup}\n" > >> -o = /dev/null > >> The output looks like this: > >> http://www.google.com 0.093 0.066 > >> Downloaded data is dumped to /dev/null, but I capture the timings > >> for name resolution and the total transaction so that if I want I can > >> analyze them later. I used this method before to identify a problem > >> with the DNS proxy on the firewall, so thought this would be a useful > >> method to do the same thing. > >> All three boxes are using Google for name resolution: 8.8.8.8 - > >> so that I can eliminate variances based on possible problems with our > >> AD DNS infrastructure - I don't think there are any, but.... > >> Currently, our AD DNS points to 8.8.8.8 for its resolvers, but > >> was originally pointed at our ISPs DNS - that change doesn't seem to > >> have made a difference in staff experience. > >> I gathered the URLs from my syslogs, so they are real sites that > >> people here visit. > >> > >> The problem with the results from the methodology: > >> Using the same data files each time, timings across all three > >> boxes have varied wildly. On Friday of last week, each of the three > >> boxes took 40 minutes to run through the list of URLs. On Tuesday they > >> each took roughly three hours. Today the external box took 40 minutes > >> and one of the internal boxes took about 3 hours, and the other > >> internal machine hadn't finished by the time I left work - cURL hung > >> on that machine and I'm going to rebuild it, as it had been mothballed > >> and only revived for this test, and really needs updating. Because > >> there is no consistency in the data, I cannot draw any conclusions. > >> I'm going to try a few more runs, but definitely feel the need for a > >> different approach > >> > >> Any thoughts you might have will be appreciated. I'm out for the next > >> couple of days, so won't be able to try any suggestions until next > >> week, but would love to hear from folks on this. > >> > >> Thanks, > >> > >> Kurt > >> > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > > > --- > > To manage subscriptions click here: > > http://lyris.sunbelt-software.com/read/my_forums/ > > or send an email to [email protected] > > with the body: unsubscribe ntsysadmin > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to [email protected] > with the body: unsubscribe ntsysadmin > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin
