Hi,
I've just started playing with raid, so I may not have fully understood
what is happening here, but..
Look at raid1_error:
static int raid1_error (mddev_t *mddev, kdev_t dev)
{
raid1_conf_t *conf = mddev_to_conf(mddev);
struct mirror_info * mirrors = conf->mirrors;
struct mirror_info *tmp;
int disks = MD_SB_DISKS;
int i;
if (conf->working_disks == 1) {
/*
* Uh oh, we can do nothing if this is our last disk, but
* first check if this is a queued request for a device
* which has just failed.
*/
for (i = 0; i < disks; i++) {
if (mirrors[i].dev==dev && !mirrors[i].operational)
return 0;
}
printk (LAST_DISK);
} else {
/*
* Mark disk as unusable
*/
for (i = 0; i < disks; i++) {
if (mirrors[i].dev==dev && mirrors[i].operational) {
mark_disk_bad(mddev, i);
break;
}
}
}
return 0;
}
Now consider a two disk mirror where one disk is ok and the other has
just been hotadded back in. That disk is marked 'spare' and
'operational' while it is being brought back in to sync.
conf->working_disks is still 1, because it isn't updated until the sync
is complete.
Now we get an error on the new disk; the code does printk (LAST_DISK),
which is harmless but wrong (the error was not on our last disk).
Compare this with the case where you have a three disk mirror where two
disks are ok and the third has just been hotadded back in.
conf->working_disks is 2. An error on the new disk will take the
'else' path and end up calling mark_disk_bad(), which will decrement
conf->working_disks to 1. Oops.
Comments?
Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]