Thanks Neil,
I just gave this patched module a shot on four systems. So far, I haven't seen the device number inappropriately increment, though as per a mail I sent a short while ago that seemed remedied by using the 1.2 superblock, for some reason. However, it appears to have introduced a new issue, and another is unresolved by it:
// BUG 1The single-command syntax to fail and remove a drive is still failing, I do not know if this is somehow contributing to the further (new) issues below:
[EMAIL PROTECTED] tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0 mdadm: set /dev/dm-0 faulty in /dev/md0 mdadm: hot remove failed for /dev/dm-0: Device or resource busy [EMAIL PROTECTED] tmp]# mdadm /dev/md0 --remove /dev/dm-0 mdadm: hot removed /dev/dm-0 // BUG 2Now, upon adding or re-adding a "fail...remove"'d drive, it is not used for resync. I realized previously that added drives weren't re-synced until the existing array build was done, then they were grabbed. This however is a clean/active array that is rejecting the drive.
I've performed this identically on both a clean & active array, as well as a newly-created (resync'ing) array, to the same effect. Even after rebuild or reboot, the removed drive isn't taken back and remains listed as a "faulty spare", with dmesg indicating that it is "non-fresh".
// DMESG:
md: kicking non-fresh dm-0 from array!
// ARRAY status 'mdadm -D /dev/md0'
State : active, degraded
Active Devices : 13
Working Devices : 13
Failed Devices : 1
Spare Devices : 0
Layout : near=1, offset=2
Chunk Size : 512K
Name : 0
UUID : 05c2faf4:facfcad3:ba33b140:100f428a
Events : 22
Number Major Minor RaidDevice State
0 253 1 0 active sync /dev/dm-1
1 253 2 1 active sync /dev/dm-2
2 253 5 2 active sync /dev/dm-5
3 253 4 3 active sync /dev/dm-4
4 253 6 4 active sync /dev/dm-6
5 253 3 5 active sync /dev/dm-3
6 253 13 6 active sync /dev/dm-13
7 0 0 7 removed
8 253 7 8 active sync /dev/dm-7
9 253 8 9 active sync /dev/dm-8
10 253 9 10 active sync /dev/dm-9
11 253 11 11 active sync /dev/dm-11
12 253 10 12 active sync /dev/dm-10
13 253 12 13 active sync /dev/dm-12
7 253 0 - faulty spare /dev/dm-0
Let me know what more I can do to help track this down. I'm reverting
this patch, since it is behaving less-well than before. Will be happy
to try others.
Attached are typescript of the drive remove/add sessions and all output. /eli Neil Brown wrote:
On Friday October 6, [EMAIL PROTECTED] wrote:
>
> This patch has resolved the immediate issue I was having on 2.6.18 with
> RAID10. Previous to this change, after removing a device from the array
> (with mdadm --remove), physically pulling the device and
> changing/re-inserting, the "Number" of the new device would be
> incremented on top of the highest-present device in the array. Now, it
> resumes its previous place.
>
> Does this look to be 'correct' output for a 14-drive array, which dev 8
> was failed/removed from then "add"'ed? I'm trying to determine why the
> device doesn't get pulled back into the active configuration and
> re-synced. Any comments?
Does this patch help?
Fix count of degraded drives in raid10.
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
### Diffstat output
./drivers/md/raid10.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c 2006-10-09 14:18:00.000000000 +1000
+++ ./drivers/md/raid10.c 2006-10-05 20:10:07.000000000 +1000
@@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
disk = conf->mirrors + i;
if (!disk->rdev ||
- !test_bit(In_sync, &rdev->flags)) {
+ !test_bit(In_sync, &disk->rdev->flags)) {
disk->head_position = 0;
mddev->degraded++;
}
NeilBrown
gzOQS9601Yo3.gz
Description: GNU Zip compressed data
gzpg0rIneCz1.gz
Description: GNU Zip compressed data
