Have you looked at the NIC offload settings?  I recently had a similar
problem on my home machine.  I was getting a lot of intermittent DNS
failures though TCP connectivity (once established) seemed OK.  I had
switched among the ISP DNS, Google DNS, and OpenDNS without any better
results, so figured it was something with my new equipment -  I had recently
replaced the NIC, switch, and router because of lightning, so where to look
wasn't immediately evident.

I eventually fired up wireshark and saw that header checksums were often
showing up as zeroes.  After disabling offloading, everything was OK again.


Jeff


On Mon, Jun 13, 2011 at 12:51 AM, Kurt Buff <[email protected]> wrote:

> Pardon the lateness of this reply (I've been out of town for a couple
> of days), but no, the lack of consistency leaves me with questions,
> not answers - because I have only one instance of inside being slower
> than outside. Let me be a bit more specific:
>
> Date         Machine   Elapsed
> 2011-05-03   Ext       40m
> 2011-05-03   Int1      40m
>
> 2011-05-06   Ext       2h 46m
> 2011-05-06   Int1      2h 46m
> 2011-05-06   Int2      2h 46m
>
> 2011-05-08   Ext       40m
> 2011-05-06   Int1      2h 46m
> 2011-05-06   Int2      DNF
>
> So, before I left work on Wednesday, I scheduled this same task for
> 12:30 daily on each of the machines. I'll be using the new data for a
> deeper analysis.
>
> Unfortunately, my manager just emailed me that he's "tweaked our DNS
> setup" while I was out - who knows how, or how that affected things.
>
> Sigh.
>
> Kurt
>
>
> On Thu, Jun 9, 2011 at 03:25, Andrew S. Baker <[email protected]> wrote:
> > Doesn't the fact that there is no consistency in the data from systems
> > placed at different points in your network *helpful* to determining where
> > there is a potential slowdown?
> >
> > Users complain about slow performance, and your logging shows that speeds
> > outside are faster than those inside.
> >
> > This would indicate that something on the inside is a bottleneck at that
> > time...
> >
> > It would seem to me that you have more than enough data to drill down and
> > find out where the issues are taking place.
> >
> > ASB (Professional Bio)
> > Harnessing the Advantages of Technology for the SMB market...
> >
> >
> >
> >
> > On Thu, Jun 9, 2011 at 1:12 AM, Kurt Buff <[email protected]> wrote:
> >>
> >> All,
> >>
> >> I'm in need of a new approach to troubleshooting staff complaints
> >> about intermittent slowness of web browsing. We have about 200 staff
> >> members on site, the symptoms are intermittent at best, but include
> >> some generalized slowness in page loads, and occasional complete page
> >> misses - that is, staff report that a page fails to load at all, with
> >> a message that the system can't find the page, but hitting refresh
> >> will usually bring the page right up.
> >>
> >> My current testing methodology seems to be getting me nowhere and
> >> causing me to lose hair in great chunks. I outline the methodology
> >> below because someone might spot a flaw in it.
> >>
> >> I'm not well versed in reading packets, so haven't yet resorted to
> >> wireshark or tcpdump, but my testing so far leads me to believe that I
> >> won't find much that way. If your reading of the situation leads you
> >> to believe otherwise, I'm all ears. But I'm also really interested in
> >> hearing other things all y'all might suggest on how to go about this.
> >>
> >> Network physical configuration:
> >>     DS3 >> HP 2524 switch >> Sidewinder firewall >> HP 2524 switch >>
> >> Barracuda web filter >> HP 3400cl switch >> production VLANs
> >>
> >> Network logical configuration:
> >>     No VLANs externally, 9 VLANs that run over the 3400cl and 18
> >> VLANs (the ones on the 3400cl, plus 9 for test/dev/other) that run on
> >> the internal HP 2524. The firewall is a HA pair (active/passive) and
> >> has a VLANed interface to the HP 2524 - it sees all of the VLANs.
> >>
> >> Other data:
> >>     I've got ntop running on two different points on the network -
> >> the external HP 2524, and the HP 3400cl - no load anomalies for the
> >> LAN or Internet connection noted.
> >>
> >> Testing methodology:
> >>     I have placed a FreeBSD box with a public IP address external to
> >> the firewall, and two FreeBSD boxes internal to the firewall on
> >> different VLANs. One of the internal FreeBSD boxes is on a VLAN that
> >> doesn't traverse the 3400cl, and the other is placed in a VLAN that
> >> does - both VLANs transit the Barracuda, as do all staff machines.
> >> Each box has cURL installed (there's a version for Windows as well),
> >> and is given an identical list of about 2100 unique (http://fqdn only
> >> - not http://fqdn/somepath) URLs to resolve and download. I kick off
> >> the batch files manually - and simultaneously.
> >>     The batch file is simple:
> >>          date > /root/out.txt
> >>          /usr/local/bin/curl -K /root/urls.txt >> /root/out.txt
> >>          date >> /root/out.txt
> >>     The entries are all formatted similarly, e.g.:
> >>          url = "http://www.google.com";
> >>          -s
> >>          -w = "%{url_effective}\t%{time_total}\t%{time_namelookup}\n"
> >>          -o = /dev/null
> >>     The output looks like this:
> >>          http://www.google.com   0.093   0.066
> >>     Downloaded data is dumped to /dev/null, but I capture the timings
> >> for name resolution and the total transaction so that if I want I can
> >> analyze them later. I used this method before to identify a problem
> >> with the DNS proxy on the firewall, so thought this would be a useful
> >> method to do the same thing.
> >>     All three boxes are using Google for name resolution: 8.8.8.8 -
> >> so that I can eliminate variances based on possible problems with our
> >> AD DNS infrastructure - I don't think there are any, but....
> >>     Currently, our AD DNS points to 8.8.8.8 for its resolvers, but
> >> was originally pointed at our ISPs DNS - that change doesn't seem to
> >> have made a difference in staff experience.
> >>     I gathered the URLs from my syslogs, so they are real sites that
> >> people here visit.
> >>
> >> The problem with the results from the methodology:
> >>     Using the same data files each time, timings across all three
> >> boxes have varied wildly. On Friday of last week, each of the three
> >> boxes took 40 minutes to run through the list of URLs. On Tuesday they
> >> each took roughly three hours. Today the external box took 40 minutes
> >> and one of the internal boxes took about 3 hours, and the other
> >> internal machine hadn't finished by the time I left work - cURL hung
> >> on that machine and I'm going to rebuild it, as it had been mothballed
> >> and only revived for this test, and really needs updating. Because
> >> there is no consistency in the data, I cannot draw any conclusions.
> >> I'm going to try a few more runs, but definitely feel the need for a
> >> different approach
> >>
> >> Any thoughts you might have will be appreciated. I'm out for the next
> >> couple of days, so won't be able to try any suggestions until next
> >> week, but would love to hear from folks on this.
> >>
> >> Thanks,
> >>
> >> Kurt
> >>
> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >
> > ---
> > To manage subscriptions click here:
> > http://lyris.sunbelt-software.com/read/my_forums/
> > or send an email to [email protected]
> > with the body: unsubscribe ntsysadmin
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin
>
>

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

Reply via email to