Re: [PATCH] fix random failures in shell/integrity.sh

John Stoffel Thu, 07 Aug 2025 07:38:42 -0700

>>>>> "Mikulas" == Mikulas Patocka <mpato...@redhat.com> writes:

> On Thu, 7 Aug 2025, Stuart D Gathman wrote:

>> On Wed, 6 Aug 2025, John Stoffel wrote:
>> 
>> > > > > > > "Mikulas" == Mikulas Patocka <mpato...@redhat.com> writes:
>> > 
>> > > The problem is that the raid1 implementation may freely choose which leg
>> > > to read from. If it chooses to read from the non-corrupted leg, the
>> > > corruption is not detected, the number of mismatches is not incremented
>> > > and the test reports this as a failure.
>> > 
>> > So wait, how is integrity supposed to work in this situation then?  In
>> > real life?  I understand the test is hard, maybe doing it in a loop
>> > three times?  Or configure the RAID1 to prefer one half over another
>> > is the way to make this test work?

> If you want to make sure that you detect (and correct) all mismatches, you 
> have to scrub the raid array.

And how do you know which level of the array is showing the errors?  I
could have a RAID1 array composed of a single partition on the left
side, but then a RAID0 of two smaller disks on the right side.  So how
would this read() flag know what to do?  

I would assume the integrity sub-system would be reading from both
sides and comparing them to look for errors.  When you find a
mis-match, how do you tell which side is wrong?  

>> Linux needs an optional parameter to read() syscall that is "leg index"
>> for the blk interface.  Thus, btrfs scrub can check all legs, and this
>> test can check all legs.  Filesystems with checks can repair corruption
>> by rewriting the block after finding a leg with correct csum.
>> 
>> This only needs a few bits (how many legs can there be?), so can go in
>> the FLAGS argument.

> I think that adding a new bit for the read syscalls is not a
> workable solition. There are so many programs using the read()
> syscall and teaching them to use this new bit is impossible.

It's also the completely wrong level for this type of support.  But
maybe they can convince us with a pseudo-code example of how an
application would use this read() extension to solve the issue?  

But right now, it's just for testing, and if the tests don't work
correctly without a silly workaround like this, then your tests aren't
representative of the real world use!  

John

Re: [PATCH] fix random failures in shell/integrity.sh

Reply via email to