On Thursday, 7 March 2019 14:45:31 GMT Rich Freeman wrote: > On Thu, Mar 7, 2019 at 9:29 AM Grant Edwards <[email protected]> wrote: > > On 2019-03-07, Mick <[email protected]> wrote: > > > I can think of 3 things, but more learned M/L contributors may add to > > > these: > > > > > > 1. The SATA connection has come loose. With time and movement it can > > > come > > > (slightly) adrift. Pushing it back in fully fixes this problem - also > > > see No. 2 below. > > > > > > 2. The physical connector's contacts are beginning to oxidise. Reseat > > > the > > > SATA cable connectors both on the drive and any ribbons on the MoBo. > > > This > > > usualy cleans any oxidisation. > > > > > > 3. The AHCI driver is deploying energy saving measures (aka. Aggressive > > > Link> > > > > Power Management - ALPM). Check the output of: > > > cat /sys/class/scsi_host/host*/link_power_management_policy > > > > > > If it doesn't say 'max_performance' you'll need to revisit your BIOS > > > settings and also PCIEASPM settings in the kernel. > > > > > > 4. Finally, there is a chance the PSU is playing up. > > > > Perhaps it's already been mentioned, but failing RAM can cause all > > sorts failures that might appear to be failing disks, failing network > > cards, failing video cards whatever. I'd run memtest86 for at least > > 12 hours just to make sure... > > Failing RAM or failing power certainly can cause all manner of > filesystem and other corruption. I've seen it firsthand and cleaning > up from it is a total mess (usually best to restore from backup). I > would definitely start with a memory test - if the motherboard is good > then you can work outwards from there. > > From what I've heard SSDs can have bizarre failure modes since they > interpose a logical layer between the physical storage media and the > rest of the system. They're doing wear-leveling and so on behind the > scenes, which means that if something goes wrong all kinds of bizarre > problems can occur. > > I've also experienced a spinning hard drive exhibit lots of data > corruption issues due to a faulty SATA interface (not sure where in > the interface it - chipset, port, or cable). ZFS saved me there with > detection and resolution of errors, and when I moved the drive to a > different HBA it worked fine after a scrub. I'd never seen anything > like it before but it really made me appreciate ZFS (btrfs should have > also worked) - I don't think mdadm would have had any way to resolve > these errors easily, though maybe if I used a hex editor to figure out > which drive was the bad one I might have been able to move it, wipe > it, then re-add it to the mirror pair and let it rebuild. With ZFS I > just got an email complaining about errors from zed and it just kept > beating back the hordes until I fixed the connection. I forget if it > dropped the drive or not - I didn't have any spares but if I did I > suspect it would have swapped it in after enough problems.
Good points raised re. faulty memory. Oxidisation can also occur on RAM modules' contacts and reseating them works well. However, I can't recall the OP mentioning corrupt data, which is usually the first thing observed with faulty memory. -- Regards, Mick
signature.asc
Description: This is a digitally signed message part.

