Re: [gentoo-user] e2fsck -c when bad blocks are in existing file?

Wols Lists Tue, 08 Nov 2022 10:24:47 -0800

On 08/11/2022 13:20, Michael wrote:

On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:

I've got an SSD that's failing, and I'd like to know what files
contain bad blocks so that I don't attempt to copy them to the
replacement disk.


According to e2fsck(8):

        -c     This option causes e2fsck to use badblocks(8)  program  to  do
  a read-only scan of the device in order to find any bad blocks.  If any
bad blocks are found, they are added to the bad  block  inode to  prevent
them from being allocated to a file or directory.  If this option is
specified twice, then the bad block scan  will  be done using a
non-destructive read-write test.

What happens when the bad block is _already_allocated_ to a file?

--
Grant


Previously allocated to a file and now re-allocated or not, my understanding
is with spinning disks the data in a bad block stays there unless you've dd'ed
some zeros over it.  Even then read or write operations could fail if the
block is too far gone.[1]  Some data recovery applications will try to read
data off a bad block in different patterns to retrieve what's there.  Once the
bad block is categorized as such it won't be used by the filesystem to write
new data to it again.

With SSDs the situation is less deterministic, because the disk's internal
wear levelling firmware moves things around according to its algorithms to
remap bad blocks. This is all transparent to the filesystem, block addresses
sent to the fs are virtual anyway.  Bypassing the firmware controller to
access individual cells on an SSD requires specialist equipment and your own
lab, although things may have evolved since I last looked into this.

Which is actually pretty much exactly the same as what happens withspinning rust.

The primary aim of a hard drive - SSD or spinning rust - is to save theuser's data. If the drive can't read the data it will do nothing savereturning a read error. Think about it - any other action will simplymake matters worse, namely the drive is actively destroyingpossibly-salvageable data.

All being well, the user has raid or backups, and will be able tore-write the file, at which point the drive will attempt recovery, as itnow has KNOWN GOOD data. If the write fails, the block will then beadded to the *drive internal* badblock list, and will be remapped elsewhere.

MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they do,something is seriously wrong, because the drive should be hiding it fromthe OS.


The general advice is to avoid powering down an SSD which is suspected of
corruption, until all the data is copied/recovered off it first.  If you power
it down, data on it may never be accessible again without the aforementioned
lab.

Seriously, this is EXTREMELY GOOD advice. I don't know whether it isstill true, but there have been plenty of stories in the past aboutSSDs, when they get too many errors, they self-destruct on power-down!!!

This imho is a serious design fault - you can't recover data from an SSDthat won't boot - but the fact is it appears to be a deliberate decisionby the manufacturers.


BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate
the problem without much benefit by accelerating wear and causing additional
cells to fail.  At the same time you could be relying on the suspect disk
firmware to access via its virtual map the data on some of its cells.  Data
scrubbing (btrfs, zfs) and recent backups would probably be a better strategy
with SSDs.

Yup. If you suspect badblocks have damaged your data, you need backupsor raid. And then don't worry about it - apart from making sure yourdrives look healthy and replacing any that are dodgy.

Just make sure you interpret smartmontools data correctly - perfectlyhealthy drives can drop dead for no apparent reason, and drives thatlook at death's door will carry on for ever. In particular, read errorsaren't serious unless they are accompanied by a growing number ofrelocation errors. If the relocation number jumps, watch it. If itdoesn't move while you're watching, it was probably a glitch and thedrive is okay. But use your head and be sensible. Any sign of regularfailed writes, BIN THE DRIVE.

(I think my 8TB drive says 1 read error per less-than-two end-to-endscans is well within spec...)


Cheers,
Wol

Re: [gentoo-user] e2fsck -c when bad blocks are in existing file?

Reply via email to