On Wed, 21 Aug 2024, David Chu wrote:
> From: Mikulas Patocka <mpato...@redhat.com>
>
>
>
>
>
> Hi Mikulas,
>
> On Wed, 21 Aug 2024, Mikulas Patocka wrote:
> > 'D' - do nothing - dm-integrity doesn't do anything to try to maintain
> > data/metadata integrity - if the system crashes, the metadata may be
> > corrupted. It may be useful for things like operating system installation,
> > where you don't recover from a crash at all.
>
> Thanks for the quick and detailed response! I am actually *not interested
> in crashes*, but in what happens during a normal run, when there are two data
> writes to the same sector on disk. Let's say these writes are write A and
> write B, and we are running dm-integrity in 'D' mode (so there is no journal).
>
> dm-integrity makes sure that if the writes' sector ranges intersect, then one
> write will not be sent to disk until the other returns, like so:
>
> Write A and B
> |
> v
> -----------------------------------------
> | dm-integrity |
> -----------------------------------------
> | ^ |
> v Write A | Write A end_io v Write B
> -----------------------------------------
> | disk |
> -----------------------------------------
>
> dm-integrity then stores the hash of write B.
>
> This behavior suggests to me that dm-integrity assumes that if write A returns
> before write B is sent to disk, then write A must be written to disk *before*
> write B (or maybe write A is never written, but in any case, write B is the
> final write). Otherwise, if the disk reorders write A and write B, then there
> would be a mismatch between the hash that dm-integrity stores and the actual
> write on disk.
>
> Is this the assumption dm-integrity is making?
> And if so, how does it square with the hardware reordering I/O requests?
>
> Thanks,
> David
Hi
There is a red-black-tree of all in-progress I/O (see ic->in_progress) and
when we start an I/O, we add it to the tree with "add_new_range" and when
we end an I/O, we delete it from the tree with "remove_range_unlocked".
The tree makes sure that there are no overlapping I/Os in progress.
Regarding disk-reordering - the disk may reorder I/Os, but if there is no
crash, the disk must appear to be coherent. Therefore, if we write A, get
A's endio and then write B, the disk must read B from this location, it
can't read A.
The reordering only becomes a problem, if the system crashes (in that
case, it is unknown if the disk will read A or B after a crash). I think
that the SCSI standard even allows reading garbage after a crash.
Mikulas