Doesn't the fact that there is no consistency in the data from systems placed at different points in your network *helpful* to determining where there is a potential slowdown?
Users complain about slow performance, and your logging shows that speeds outside are faster than those inside. This would indicate that something on the inside is a bottleneck at that time... It would seem to me that you have more than enough data to drill down and find out where the issues are taking place. *ASB *(Professional Bio <http://about.me/Andrew.S.Baker/bio>) *Harnessing the Advantages of Technology for the SMB market... * On Thu, Jun 9, 2011 at 1:12 AM, Kurt Buff <[email protected]> wrote: > All, > > I'm in need of a new approach to troubleshooting staff complaints > about intermittent slowness of web browsing. We have about 200 staff > members on site, the symptoms are intermittent at best, but include > some generalized slowness in page loads, and occasional complete page > misses - that is, staff report that a page fails to load at all, with > a message that the system can't find the page, but hitting refresh > will usually bring the page right up. > > My current testing methodology seems to be getting me nowhere and > causing me to lose hair in great chunks. I outline the methodology > below because someone might spot a flaw in it. > > I'm not well versed in reading packets, so haven't yet resorted to > wireshark or tcpdump, but my testing so far leads me to believe that I > won't find much that way. If your reading of the situation leads you > to believe otherwise, I'm all ears. But I'm also really interested in > hearing other things all y'all might suggest on how to go about this. > > Network physical configuration: > DS3 >> HP 2524 switch >> Sidewinder firewall >> HP 2524 switch >> > Barracuda web filter >> HP 3400cl switch >> production VLANs > > Network logical configuration: > No VLANs externally, 9 VLANs that run over the 3400cl and 18 > VLANs (the ones on the 3400cl, plus 9 for test/dev/other) that run on > the internal HP 2524. The firewall is a HA pair (active/passive) and > has a VLANed interface to the HP 2524 - it sees all of the VLANs. > > Other data: > I've got ntop running on two different points on the network - > the external HP 2524, and the HP 3400cl - no load anomalies for the > LAN or Internet connection noted. > > Testing methodology: > I have placed a FreeBSD box with a public IP address external to > the firewall, and two FreeBSD boxes internal to the firewall on > different VLANs. One of the internal FreeBSD boxes is on a VLAN that > doesn't traverse the 3400cl, and the other is placed in a VLAN that > does - both VLANs transit the Barracuda, as do all staff machines. > Each box has cURL installed (there's a version for Windows as well), > and is given an identical list of about 2100 unique (http://fqdn only > - not http://fqdn/somepath) URLs to resolve and download. I kick off > the batch files manually - and simultaneously. > The batch file is simple: > date > /root/out.txt > /usr/local/bin/curl -K /root/urls.txt >> /root/out.txt > date >> /root/out.txt > The entries are all formatted similarly, e.g.: > url = "http://www.google.com" > -s > -w = "%{url_effective}\t%{time_total}\t%{time_namelookup}\n" > -o = /dev/null > The output looks like this: > http://www.google.com 0.093 0.066 > Downloaded data is dumped to /dev/null, but I capture the timings > for name resolution and the total transaction so that if I want I can > analyze them later. I used this method before to identify a problem > with the DNS proxy on the firewall, so thought this would be a useful > method to do the same thing. > All three boxes are using Google for name resolution: 8.8.8.8 - > so that I can eliminate variances based on possible problems with our > AD DNS infrastructure - I don't think there are any, but.... > Currently, our AD DNS points to 8.8.8.8 for its resolvers, but > was originally pointed at our ISPs DNS - that change doesn't seem to > have made a difference in staff experience. > I gathered the URLs from my syslogs, so they are real sites that > people here visit. > > The problem with the results from the methodology: > Using the same data files each time, timings across all three > boxes have varied wildly. On Friday of last week, each of the three > boxes took 40 minutes to run through the list of URLs. On Tuesday they > each took roughly three hours. Today the external box took 40 minutes > and one of the internal boxes took about 3 hours, and the other > internal machine hadn't finished by the time I left work - cURL hung > on that machine and I'm going to rebuild it, as it had been mothballed > and only revived for this test, and really needs updating. Because > there is no consistency in the data, I cannot draw any conclusions. > I'm going to try a few more runs, but definitely feel the need for a > different approach > > Any thoughts you might have will be appreciated. I'm out for the next > couple of days, so won't be able to try any suggestions until next > week, but would love to hear from folks on this. > > Thanks, > > Kurt > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin
