On Mon, Aug 24, 2009 at 12:29:19PM -0600, Kelly Martin wrote: > I just experienced a hard drive failure on one of my FreeBSD 7.2 > production servers with no backup! I am so mad at myself for not > backing up!!
Welcome to the club. :-) > Now it's a salvage operation. Here are the type of errors > I was getting on the console, over-and-over: > > ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503 > ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - > completing request directly > ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - > completing request directly > ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly > ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375 > g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5 It _could_ just be a bad or improperly connected SATA cable. Try changing or re-seating the cable. Read errors cannot damage your data, but write errors can! Immediately stop all writing to the disk. Re-mount the partitions on that disk as read-only, or unmount them. To see if a disk really is broken, install sysutils/smartmontools, and run 'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated sectors), the disk is dying and should be unplugged to prevent it from getting worse. > My question: what kind of checks and/or repair tools should I run on > the damaged drive after it's mounted? As others have mentioned, first make a copy (with the disk unmounted) of the partitions on that disk with dd, saving them to another drive. That way you can experiment with the data without further deterioration of the original. You can use this disk image e.g. as a vnode-backed memory disk, see mdconfig(8). If you cannot get a good copy of the disk partitions it might be a good idea to get a quote from a professional hard drive data recovery company to do that for you. I've never had occasion to try this (hooray for backups) but I've heard it can be quite expensive. :-/ Try using fsck_ffs on (copies of) the disk image to see if that can restore the damage. If the damage is beyond repair for fsck_ffs, you have a real problem. Of course is you have a good disk image, your data is still there, but you might have to use a forensics program like sysutils/sleuthkit or hexdump to try and piece files together. And even then you cannot be sure that there is no corrupted data in the files themselves. Good luck with that. :-( Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
Description: PGP signature