What seems to be happening is that the raid1_stop_resync() function is
being called, the mirror IS being marked as not operational,
but it is not having an effect - the second disk continues to be
accessed.
I see that PID 6 is interrupted - that is mdrecoveryd.
Interestingly, I never see the message:
raid1: mirror resync was not fully finished, restarting next time.
static int raid1_stop_resync (mddev_t *mddev)
{
raid1_conf_t *conf = mddev_to_conf(mddev);
/* MOD BY EZA 09/14/00 */
printk ("raid1: request to stop resync\n");
if (conf->resync_thread) {
if (conf->resync_mirrors) {
conf->resync_mirrors = 2;
md_interrupt_thread(conf->resync_thread);
printk(KERN_INFO "raid1: mirror resync was not fully finished,
restarting next time.\n");
return 1;
}
return 0;
}
return 0;
}
Which seems to indicate that the "raid1syncd" is not running. (is
there a good way to verify this?) I am looking for how resync_thread
is set and I can't find an error there... OK, if that process isn't
writing to the disk, then which one is?
Now I'm looking at 'md.c' some more...
-Eric.
--------------------------------------------------------------------------------
Here's what I did:
I added some prink's to raid1.o:
At the top of raid1_error()
printk ("raid1: total working disks is %d. disk[0].operational=%d
disk[1].operational=%d\n",
conf->working_disks,
mirrors[0].operational,
mirrors[1].operational);
At the top of raid1_stop_resync:
printk ("raid1: request to stop resync\n");
At the top of raid1_restart_resync:
printk ("raid1: request to start resync\n");
Sep 15 05:58:33 dru1a kernel: md: using 128k window.
Sep 15 05:58:35 dru1a kernel: scsi0 channel 0 : resetting for second half of ret
ries.
Sep 15 05:58:35 dru1a kernel: SCSI bus is being reset for host 0 channel 0.
Sep 15 05:58:38 dru1a kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 retu
rn code = 26030000
Sep 15 05:58:38 dru1a kernel: scsidisk I/O error: dev 08:11, sector 0
Sep 15 05:58:38 dru1a kernel: raid1: request to stop resync
Sep 15 05:58:38 dru1a kernel: interrupting MD-thread pid 6
Sep 15 05:58:38 dru1a kernel: raid1: total working disks is 1. disk[0].operation
al=1 disk[1].operational=0
Sep 15 05:58:38 dru1a kernel: raid1: only one disk left and IO error.
And then these sets of messages are repeated over and over:
Sep 15 06:00:29 dru1a kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 retu
rn code = 26030000
Sep 15 06:00:29 dru1a kernel: scsidisk I/O error: dev 08:11, sector 184
Sep 15 06:00:29 dru1a kernel: raid1: request to stop resync
Sep 15 06:00:29 dru1a kernel: interrupting MD-thread pid 6
Sep 15 06:00:29 dru1a kernel: raid1: total working disks is 1. disk[0].operation
al=1 disk[1].operational=0
Sep 15 06:00:29 dru1a kernel: raid1: only one disk left and IO error.
Sep 15 06:00:30 dru1a kernel: scsi0 channel 0 : resetting for second half of ret
ries.
Sep 15 06:00:30 dru1a kernel: SCSI bus is being reset for host 0 channel 0.
...
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]