st...@prd.co.uk (Steve Blinkhorn) writes:

> Sep  5 16:56:49 trafalgar /netbsd: wd0a: error reading fsbn 1005056 of 
> 1005056-1005087 (wd0 bn 1005119; cn 997 tn 2 sn 17), retrying
> Sep  5 16:56:49 trafalgar /netbsd: wd0: (uncorrectable data error)
>
> The fsbn is mostly 1005056 but sometimes 1005086.
>
> Server response time is impacted.
>
> I've never had, so never tackled, this kind of issue before.   Advice
> much appreciated.

Backup and a new disk, as others have said.

But if you want to recover:

  print out the dd man page.  highlight what skip, seek (and maybe
  iseek) mean.  really!

  do all of this unounted if you can, but mounted actually works.

  Use dd to read the disk, with a big blocksize.  Probably wd0d (c in
  non-x86), so blocks in dd and kernel blocks of the raw device match.

  When you find the error, realize that the whole block will fail.
  Start over without a bs argument, with skip to start at the beginning
  of the bad block.  Find the first individual block.

  write that block with dd from /dev/zero and seek.  Be really sure to
  use count=1.

  repeat the read/find/write cycle

  realize that this will be messy.  If the block you clear is inodes,
  you'll lose whole files.  If it's in a file, the file will silently
  get zeros.

As to how to find which file the block is in, there were programs ncheck
and icheck in the old days (sixth and seventh edition sometime).  It
looks like fsdb will do this.

Once i have more than one or two of these, I don't trust the disk.  But
I do use such disks as nth backups, for n being one more than I think i
need.

Attachment: signature.asc
Description: PGP signature

Reply via email to