In response to Barry Steyn <ba...@redbutton.co.za>:

> Hi guys,
> 
> We're having a serious problem here with our live server, it's very 
> sluggish all of a sudden. The problem is that Apache is *really* slow 
> responding to https requests but still fairly quick on http. We've 
> checked and ruled out all of the following:
> 
>     * CPU usage is normal and so is memory usage
>     * All other system daemons seem to be running just fine
>     * smartctl -a on both our disks (gmirror RAID 1) says health test
>       PASSED, gmirror status fine, smartd running for a week now with
>       nothing in the logs
>     * nothing strange in the apache access or error logs
>     * restarted apache, stopped jboss, upgraded apache to latest patch
>       level, even soft rebooted the box but to no avail
>     * nobody has done any upgrades, code changes, physical changes or
>       anything else to the box before the problem first manifested itself
>     * hosting problems - this problem even occurs when you do a wget on
>       the same box with https://localhost/... , in fact then it doesn't
>       even get to the SSL handshake as it doesn't get to complain about
>       the certificate mismatch
> 
> The weird thing is that the first time this happened a week ago, there 
> was only jboss/seam (which runs behind apache via mod_proxy_ajp) that 
> had an issue with sluggishness, all other https pages worked just fine. 
> Our tech time was desperate when nobody senior was available and decided 
> to hard reboot (power off and on again) the box after which it acted 
> really strangely (with disk errors in the logs, other system daemons 
> dying randomly) but eventually came right. A week later, sometime this 
> afternoon, the problems reoccurred but this time they are chronic, 
> nothing we do seems to help.
> 
> So, I keep thinking it must be a hardware problem. Not disk (or maybe it 
> is?), then perhaps faulty RAM? I always thought faulty RAM results in 
> nasty kernel panics, segfaults and other obvious symptoms but not a 
> sluggishness in one particular daemon...
> 
> Any ideas?

Given your description of the symptoms, I suspect hardware.

Sure, SMART is nice, but it's not failproof.  It also doesn't monitor the
disk controller, which can have problems.  The fact that it gave you all
sorts of disk issues after a reboot tells me that there is something wrong
that SMART and other hardware diagnostics aren't detecting.  I've seen
systems fail in ways that defy all attempts to predict and detect.  Saw
a RAID system die in such a way that the system locked up tight, in spite
of the fact that there was a backup RAID card installed that should have
taken over.

If your budget allows, I'd make it top priority to migrate off that system
and onto a new one, then get that system into a dedicated testing setup to
see if you can isolate any problems in the hardware.

Whatever else you do, make sure you have good backups of any data on that
system right away.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to