All,
I'm in need of a new approach to troubleshooting staff complaints
about intermittent slowness of web browsing. We have about 200 staff
members on site, the symptoms are intermittent at best, but include
some generalized slowness in page loads, and occasional complete page
misses - that is, staff report that a page fails to load at all, with
a message that the system can't find the page, but hitting refresh
will usually bring the page right up.
My current testing methodology seems to be getting me nowhere and
causing me to lose hair in great chunks. I outline the methodology
below because someone might spot a flaw in it.
I'm not well versed in reading packets, so haven't yet resorted to
wireshark or tcpdump, but my testing so far leads me to believe that I
won't find much that way. If your reading of the situation leads you
to believe otherwise, I'm all ears. But I'm also really interested in
hearing other things all y'all might suggest on how to go about this.
Network physical configuration:
DS3 >> HP 2524 switch >> Sidewinder firewall >> HP 2524 switch >>
Barracuda web filter >> HP 3400cl switch >> production VLANs
Network logical configuration:
No VLANs externally, 9 VLANs that run over the 3400cl and 18
VLANs (the ones on the 3400cl, plus 9 for test/dev/other) that run on
the internal HP 2524. The firewall is a HA pair (active/passive) and
has a VLANed interface to the HP 2524 - it sees all of the VLANs.
Other data:
I've got ntop running on two different points on the network -
the external HP 2524, and the HP 3400cl - no load anomalies for the
LAN or Internet connection noted.
Testing methodology:
I have placed a FreeBSD box with a public IP address external to
the firewall, and two FreeBSD boxes internal to the firewall on
different VLANs. One of the internal FreeBSD boxes is on a VLAN that
doesn't traverse the 3400cl, and the other is placed in a VLAN that
does - both VLANs transit the Barracuda, as do all staff machines.
Each box has cURL installed (there's a version for Windows as well),
and is given an identical list of about 2100 unique (http://fqdn only
- not http://fqdn/somepath) URLs to resolve and download. I kick off
the batch files manually - and simultaneously.
The batch file is simple:
date > /root/out.txt
/usr/local/bin/curl -K /root/urls.txt >> /root/out.txt
date >> /root/out.txt
The entries are all formatted similarly, e.g.:
url = "http://www.google.com"
-s
-w = "%{url_effective}\t%{time_total}\t%{time_namelookup}\n"
-o = /dev/null
The output looks like this:
http://www.google.com 0.093 0.066
Downloaded data is dumped to /dev/null, but I capture the timings
for name resolution and the total transaction so that if I want I can
analyze them later. I used this method before to identify a problem
with the DNS proxy on the firewall, so thought this would be a useful
method to do the same thing.
All three boxes are using Google for name resolution: 8.8.8.8 -
so that I can eliminate variances based on possible problems with our
AD DNS infrastructure - I don't think there are any, but....
Currently, our AD DNS points to 8.8.8.8 for its resolvers, but
was originally pointed at our ISPs DNS - that change doesn't seem to
have made a difference in staff experience.
I gathered the URLs from my syslogs, so they are real sites that
people here visit.
The problem with the results from the methodology:
Using the same data files each time, timings across all three
boxes have varied wildly. On Friday of last week, each of the three
boxes took 40 minutes to run through the list of URLs. On Tuesday they
each took roughly three hours. Today the external box took 40 minutes
and one of the internal boxes took about 3 hours, and the other
internal machine hadn't finished by the time I left work - cURL hung
on that machine and I'm going to rebuild it, as it had been mothballed
and only revived for this test, and really needs updating. Because
there is no consistency in the data, I cannot draw any conclusions.
I'm going to try a few more runs, but definitely feel the need for a
different approach
Any thoughts you might have will be appreciated. I'm out for the next
couple of days, so won't be able to try any suggestions until next
week, but would love to hear from folks on this.
Thanks,
Kurt
~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~
---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin