Re: No I/O errors reported after SATA link hard reset

Gionatan Danti Thu, 17 Aug 2017 07:16:07 -0700

Hi Tejun,

Il 17-08-2017 14:48 Tejun Heo ha scritto:

Recovered errors aren't reported as IO errors and at least from link
state proper there's no way for the driver to tell apart link
glitches and buffer-erasing power issues.

Ok, so *this* is the root cause of the problem: libata not identifyingspurious link renegotiations vs brief powerloss/powerup events. Out ofcuriosity: is this a SATA-specific problem (ie: in the SATAspecification), or even SAS disks are affected?

> - why the scsi midlevel does not respond to a power loss event by
> immediately offlining the disks?


Because we don't wanna be ditching disks on temporary link glitches,
which do happen once in a while.

Any chances to report I/O errors to the upper layers *without* offliningthe device? In this manner, upper layers (ie: MDRAID) can act in a moreinformate way. For example: single disk device will simple retry thefailed operation, while MDRAID can take the "badblocks" code path todeal with the error.

So, the right way to deal with the problem probably is making use of
the SMART counter which indicates power loss events and verify that
the counter hasn't increased over link issues.  If it changed, the
device should be detached and re-probed, which will make it come back
as a different block device.  Unfortunately, I haven't had the chance
to actually implement that.

This is a very good idea, maybe I can implement it in userspace with asimple, fast polling scheme (for example, each 60 seconds). Such apolling would not prevent all corruption scenarios, but will at leasttimely inform the user.


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: [email protected] - [email protected]
GPG public key ID: FF5F32A8

Re: No I/O errors reported after SATA link hard reset

Reply via email to