On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
> I've got an SSD that's failing, and I'd like to know what files
> contain bad blocks so that I don't attempt to copy them to the
> replacement disk.
> 
> According to e2fsck(8):
> 
>        -c     This option causes e2fsck to use badblocks(8)  program  to  do
>  a read-only scan of the device in order to find any bad blocks.  If any
> bad blocks are found, they are added to the bad  block  inode to  prevent
> them from being allocated to a file or directory.  If this option is
> specified twice, then the bad block scan  will  be done using a
> non-destructive read-write test.
> 
> What happens when the bad block is _already_allocated_ to a file?
> 
> --
> Grant

Previously allocated to a file and now re-allocated or not, my understanding 
is with spinning disks the data in a bad block stays there unless you've dd'ed 
some zeros over it.  Even then read or write operations could fail if the 
block is too far gone.[1]  Some data recovery applications will try to read 
data off a bad block in different patterns to retrieve what's there.  Once the 
bad block is categorized as such it won't be used by the filesystem to write 
new data to it again.

With SSDs the situation is less deterministic, because the disk's internal 
wear levelling firmware moves things around according to its algorithms to 
remap bad blocks. This is all transparent to the filesystem, block addresses 
sent to the fs are virtual anyway.  Bypassing the firmware controller to 
access individual cells on an SSD requires specialist equipment and your own 
lab, although things may have evolved since I last looked into this.

The general advice is to avoid powering down an SSD which is suspected of 
corruption, until all the data is copied/recovered off it first.  If you power 
it down, data on it may never be accessible again without the aforementioned 
lab.

BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate 
the problem without much benefit by accelerating wear and causing additional 
cells to fail.  At the same time you could be relying on the suspect disk 
firmware to access via its virtual map the data on some of its cells.  Data 
scrubbing (btrfs, zfs) and recent backups would probably be a better strategy 
with SSDs.


[1] https://www.smartmontools.org/wiki/BadBlockHowto

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to