On Thursday, 7 March 2019 14:45:31 GMT Rich Freeman wrote:
> On Thu, Mar 7, 2019 at 9:29 AM Grant Edwards <[email protected]> 
wrote:
> > On 2019-03-07, Mick <[email protected]> wrote:
> > > I can think of 3 things, but more learned M/L contributors may add to
> > > these:
> > > 
> > > 1. The SATA connection has come loose.  With time and movement it can
> > > come
> > > (slightly) adrift.  Pushing it back in fully fixes this problem - also
> > > see No. 2 below.
> > > 
> > > 2. The physical connector's contacts are beginning to oxidise.  Reseat
> > > the
> > > SATA cable connectors both on the drive and any ribbons on the MoBo. 
> > > This
> > > usualy cleans any oxidisation.
> > > 
> > > 3. The AHCI driver is deploying energy saving measures (aka. Aggressive
> > > Link> > 
> > > Power Management - ALPM).  Check the output of:
> > >  cat /sys/class/scsi_host/host*/link_power_management_policy
> > > 
> > > If it doesn't say 'max_performance' you'll need to revisit your BIOS
> > > settings and also PCIEASPM settings in the kernel.
> > > 
> > > 4. Finally, there is a chance the PSU is playing up.
> > 
> > Perhaps it's already been mentioned, but failing RAM can cause all
> > sorts failures that might appear to be failing disks, failing network
> > cards, failing video cards whatever.  I'd run memtest86 for at least
> > 12 hours just to make sure...
> 
> Failing RAM or failing power certainly can cause all manner of
> filesystem and other corruption.  I've seen it firsthand and cleaning
> up from it is a total mess (usually best to restore from backup).  I
> would definitely start with a memory test - if the motherboard is good
> then you can work outwards from there.
> 
> From what I've heard SSDs can have bizarre failure modes since they
> interpose a logical layer between the physical storage media and the
> rest of the system.  They're doing wear-leveling and so on behind the
> scenes, which means that if something goes wrong all kinds of bizarre
> problems can occur.
> 
> I've also experienced a spinning hard drive exhibit lots of data
> corruption issues due to a faulty SATA interface (not sure where in
> the interface it - chipset, port, or cable).  ZFS saved me there with
> detection and resolution of errors, and when I moved the drive to a
> different HBA it worked fine after a scrub.  I'd never seen anything
> like it before but it really made me appreciate ZFS (btrfs should have
> also worked) - I don't think mdadm would have had any way to resolve
> these errors easily, though maybe if I used a hex editor to figure out
> which drive was the bad one I might have been able to move it, wipe
> it, then re-add it to the mirror pair and let it rebuild.  With ZFS I
> just got an email complaining about errors from zed and it just kept
> beating back the hordes until I fixed the connection.  I forget if it
> dropped the drive or not - I didn't have any spares but if I did I
> suspect it would have swapped it in after enough problems.

Good points raised re. faulty memory.  Oxidisation can also occur on RAM 
modules' contacts and reseating them works well.  However, I can't recall the 
OP mentioning corrupt data, which is usually the first thing observed with 
faulty memory.

-- 
Regards,
Mick

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to