On Mon, Mar 17, 2025 at 08:15:51AM -0400, Kent Overstreet wrote:
> On Sun, Mar 16, 2025 at 11:00:20PM -0700, Christoph Hellwig wrote:
> > On Sat, Mar 15, 2025 at 02:41:33PM -0400, Kent Overstreet wrote:
> > > That's the sort of thing that process is supposed to avoid. To be clear,
> > > you and Christoph are the two reasons I've had to harp on process in the
> > > past - everyone else in the kernel has been fine.
>
> ...
>
> > In this case you very clearly misunderstood how the FUA bit works on
> > reads (and clearly documented that misunderstanding in the commit log).
>
> FUA reads are clearly documented in the NVME spec.
NVMe's Read with FUA documentation is a pretty short:
Force Unit Access (FUA): If this bit is set to `1ยด, then for data and
metadata, if any, associated with logical blocks specified by the Read
command, the controller shall:
1) commit that data and metadata, if any, to non-volatile medium; and
2) read the data and metadata, if any, from non-volatile medium.
So this aligns with ATA and SCSI.
The concern about what you're doing is with respect to "1". Avoiding that
requires all writes also set FUA, or you sent a flush command before doing any
reads. Otherwise how would you know whether or not read data came from a
previous cached write?
I'm actually more curious how you came to use this mechanism for recovery.
Have you actually seen this approach fix a real problem? In my experience, a
Read FUA is used to ensure what you're reading has been persistently committed.
It can be used as an optimization to flush a specific LBA range, but I'd not
seen this used as a bit error recovery mechanism.