On Feb 6, 2012, at 8:15 AM, Ryan Merrell wrote: > We have an Intel modular blade server. The chassis has 2x 3-disk RAID(5) > arrays. Volume 1 is what the OS (FreeBSD 7.2) is installed on and Volume 2 is > mounted at /usr. These two volumes are da0 and da1.
This doesn't matter directly to your issue, but a 3-disk RAID-5 setup is not a great choice. With six disks available, you'd almost certainly do better either as a 6-disk-wide RAID-5 or a RAID-10. > I got email notifications saying the web host I run in a jail hosted on this > server was down. I try to SSH into it, but it fails. I ping it and I get a > 50% return rate. So I log in to the management blade and start a virtual KVM > sessions to get into the blade. Once I'm into the basehost blade, I cat > dmesg.today and get a slew of errors. Here we go.. > (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state > (da3:mpt0:0:6:1): Retrying Command (per Sense Data) > (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 > (da3:mpt0:0:6:1): CAM Status: SCSI Status Error > (da3:mpt0:0:6:1): SCSI Status: Check Condition > (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b > (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state > (da3:mpt0:0:6:1): Retrying Command (per Sense Data) > (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 > (da3:mpt0:0:6:1): CAM Status: SCSI Status Error > (da3:mpt0:0:6:1): SCSI Status: Check Condition > (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b > (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state > (da3:mpt0:0:6:1): Retries Exhausted > > As mentioned before, our two volumes are da0 and da1. /dev lists da2 and da3 > as well, but I have no idea what they are. How do I figure out what da3 is > and what do the above error messages say about it? Someone on the forum asked > me if the two volumes are on the same controller and the answer is yes, they > are. Check a dmesg after a reboot, or take a look at "camcontrol devlist" or "atacontrol list" and that ought to provide more information. Since you're also using GEOM labels, "glabel status" is likely to be informative as well. > GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. > GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. > GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. > GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. > Trying to mount root from ufs:/dev/da0s1a > GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. > GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. > GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. > GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. > GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. > GEOM_LABEL: Label for provider da1s1 is ufsid/4bd2077f23a6cc93. > GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. > GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. > GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. > GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. > GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. > GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. > GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. > GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. > GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. > > Was root unmounted? Whats going on here? Obviously there's some issue with > da0, which is mounted at /. The server has been up and running fine, so why > am I seeing "Trying to mount root from ufs:/dev/da0s1a"? These are standard messages from GEOM-- it's trying to look at the disk labels and figure out where to mount the various filesystems. > pid 93248 (httpd), uid 80: exited on signal 10 > pid 95624 (httpd), uid 80: exited on signal 10 > pid 97956 (httpd), uid 80: exited on signal 10 > pid 97935 (httpd), uid 80: exited on signal 10 > pid 96603 (httpd), uid 80: exited on signal 10 > pid 93210 (httpd), uid 80: exited on signal 10 > pid 98246 (httpd), uid 80: exited on signal 10 > > This is apparently whats killing our webserver. Apache receives a signal 10 > and quits.. Everything I've read says it's an issue with Apache trying to > access RAM that it shouldn't or that doesn't exist.. Is there something else > with the above da0 or da3 errors that would cause a SIGBUS on httpd? That's unclear, but normally a failing disk will cause I/O to block and the httpds will simply hang, not crash. Most likely, you've got a bug lurking in one of the Apache modules you use (mod_php is a likely candidate), but run a test instance of httpd under gdb using -X flag, and see whether you can gain better information. Or unlimit coredumpsize, and run gdb against the corefile to see what's causing the crash. Regards, -- -Chuck _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"