Eric Crist wrote:
Hey folks,

First, please 'reply-all' as I'm not on the list.

I've got a backup server that, every night, offloads things to a secondary, USB attached hard disk. We've got two of these disks, which we rotate so as to have a fairly recent off-site version, in the event of a disaster. One of the two drives has start to cause the backup server to core dump and reboot. The other works fine. I tried taking the problematic drive and repartitioning and reformatting it, but the problems persist.

Here is what I get from a kgdb:

[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC-> sudo kgdb kernel.debug /var/crash/vmcore.17 [GDB will not be able to debug user-mode threads: /usr/lib/ Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
panic: softdep_deallocate_dependencies: dangling deps
cpuid = 0
Uptime: 11d20h37m38s
Physical memory: 1011 MB
Dumping 201 MB: 186 170 154 138 122 106 90 74 58 42 26 10

#0  doadump () at pcpu.h:195
195        __asm __volatile("movl %%fs:0,%0" : "=r" (td));

Any insight is appreciated.  uname -a is:

FreeBSD hostname 7.0-RELEASE-p3 FreeBSD 7.0-RELEASE-p3 #1: Tue Jul 15 13:53:28 CDT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386

See the developers handbook for more details on how to report panics (you also need the backtrace, and it may help to catch the problem earlier if you turn on debugging).

However, this kind of panic can happen if the drive is marginal. e.g. if it loses or corrupts I/O in transit. Try compiling e.g. the /usr/src/tools/regression/fsx tool and running that against the problem disk for a few days, or even multiple instances on different files at once to really stress it. It will do lots of I/O to a file and verify that the file remains consistent throughout. It won't touch the whole drive though, so if only parts of the disk are bad it won't catch it.

For that you could try generating a large random file on another disk, keeping the md5 checksum, then writing lots of copies of it to the bad disk to fill or almost fill it, then read back the md5 checksums of each to compare. A small script could run this in a loop.

Yet another option would be to configure the disk as a geli or zfs volume, since that will validate checksums with each read and will catch data corruption anywhere on the disk.

I'd validate those things before proceeding with the existing panic.

_______________________________________________ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to