Re: [RFD] diskfilter stale RAID member detection vs. lazy scanning

Andrei Borzenkov Mon, 11 Jan 2016 13:36:25 -0800

16.07.2015 11:01, Vladimir 'φ-coder/phcoder' Serbinenko пишет:
> On 16.07.2015 05:42, Andrei Borzenkov wrote:
>> В Wed, 15 Jul 2015 20:05:56 +0200
>> Vladimir 'φ-coder/phcoder' Serbinenko <phco...@gmail.com> пишет:
>>
>>> On 28.06.2015 20:06, Andrei Borzenkov wrote:
>>>> I was looking at implementing detection of outdated RAID members.
>>>> Unfortunately it appears to be fundamentally incompatible with lazy
>>>> scanning as implemented currently by GRUB. We simply cannot stop
>>>> scanning for other copies of metadata once "enough" was seen. Because
>>>> any other disk may contain more actual copy which invalidates
>>>> everything seen up to this point.
>>>>
>>>> So basically either we officially admit that GRUB is not able to detect
>>>> stale members or we drop lazy scanning.
>>>>
>>>> Comments, ideas?
>>>>
>>> We don't need to see all disks to decide that there is no staleness. If
>>> you have an array with N devices and you can lose at most K of them,
>>> then you can check for staleness after you have seen max(K+1, N-K)
>>> drives. Why?
>>>
>>
>> It's not the problem. The problem is what to do if you see disk with
>> generation N+1 after you assembled array with generation N. This can
>> mean that what we see is old copy and we should through it away and
>> start collecting new one. If I read Linux MD code correctly, that is
>> what it actually does. And this means we cannot stop scanning even
>> after array is complete.
>>
> While it's true that it's possible that all the members we have seen are
> stale, it shouldn't be common and it's not the biggest problem. Biggest
> problem is inconsistency.
> We can never guarantee of having seen all the disks as they may not be
> eeven visible through firmware but it shouldn't stop us from fixing the
> inconsistency problem.
>> Extreme example is three-pieces mirror where each piece is actually
>> perfectly valid and usable by itself so losing two of them still means
>> we can continue to work with remaining one.
>>
> Mirrors get completely assembled in my patch.
>


I fixed trivial read error in case of raid1/raid10 (see attached patch).
It works in naive testing. We need regression tests for stale data.

From 2611f7a1649e9564cf65b1312bd76e5f3feb3a3e Mon Sep 17 00:00:00 2001
From: Andrei Borzenkov <arvidj...@gmail.com>
Date: Mon, 11 Jan 2016 23:41:13 +0300
Subject: [PATCH] Fix reading from RAID1 and RAID10

Need to set error if current disk is stale.
---
 grub-core/disk/diskfilter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grub-core/disk/diskfilter.c b/grub-core/disk/diskfilter.c
index 7fea4c0..d779a0a 100644
--- a/grub-core/disk/diskfilter.c
+++ b/grub-core/disk/diskfilter.c
@@ -782,6 +782,9 @@ read_segment (struct grub_diskfilter_segment *seg, grub_disk_addr_t sector,
 				 && err != GRUB_ERR_UNKNOWN_DEVICE)
 			  return err;
 		      }
+		    else
+		      err = GRUB_ERR_READ_ERROR;
+
 		    k++;
 		    if (k == seg->node_count)
 		      k = 0;
-- 
1.9.1

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-devel

Re: [RFD] diskfilter stale RAID member detection vs. lazy scanning

Reply via email to