More on uninterrupted resync

Eric Z. Ayers Fri, 15 Sep 2000 07:23:03 -0700

What seems to be happening is that the raid1_stop_resync() function is
being called, the mirror IS being marked as not operational,
but it is not having an effect - the second disk continues to be
accessed. 

I see that PID 6 is interrupted - that is mdrecoveryd.

Interestingly, I never see the message:

 raid1: mirror resync was not fully finished, restarting next time.


static int raid1_stop_resync (mddev_t *mddev)
{
        raid1_conf_t *conf = mddev_to_conf(mddev);

        /* MOD BY EZA 09/14/00 */
        printk ("raid1: request to stop resync\n");
        
        if (conf->resync_thread) {
                if (conf->resync_mirrors) {
                        conf->resync_mirrors = 2;
                        md_interrupt_thread(conf->resync_thread);
                        printk(KERN_INFO "raid1: mirror resync was not fully finished, 
restarting next time.\n");
                        return 1;
                }
                return 0;
        }
        return 0;
}


Which seems to indicate that the "raid1syncd" is not running. (is
there a good way to verify this?)  I am looking for how resync_thread
is set and I can't find an error there... OK, if that process isn't
writing to the disk, then which one is? 

Now I'm looking at 'md.c' some more...

-Eric.

--------------------------------------------------------------------------------

Here's what I did:

I added some prink's to raid1.o:

At the top of raid1_error()
        printk ("raid1: total working disks is %d. disk[0].operational=%d 
disk[1].operational=%d\n",
                conf->working_disks,
                mirrors[0].operational,
                mirrors[1].operational);

At the top of raid1_stop_resync:
        printk ("raid1: request to stop resync\n");

At the top of raid1_restart_resync:
        printk ("raid1: request to start resync\n");


Sep 15 05:58:33 dru1a kernel: md: using 128k window. 
Sep 15 05:58:35 dru1a kernel: scsi0 channel 0 : resetting for second half of ret
ries. 
Sep 15 05:58:35 dru1a kernel: SCSI bus is being reset for host 0 channel 0. 
Sep 15 05:58:38 dru1a kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 retu
rn code = 26030000 
Sep 15 05:58:38 dru1a kernel: scsidisk I/O error: dev 08:11, sector 0 
Sep 15 05:58:38 dru1a kernel: raid1: request to stop resync 
Sep 15 05:58:38 dru1a kernel: interrupting MD-thread pid 6 
Sep 15 05:58:38 dru1a kernel: raid1: total working disks is 1. disk[0].operation
al=1 disk[1].operational=0 
Sep 15 05:58:38 dru1a kernel: raid1: only one disk left and IO error. 


And then these sets of messages are repeated over and over:


Sep 15 06:00:29 dru1a kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 retu
rn code = 26030000 
Sep 15 06:00:29 dru1a kernel: scsidisk I/O error: dev 08:11, sector 184 
Sep 15 06:00:29 dru1a kernel: raid1: request to stop resync 
Sep 15 06:00:29 dru1a kernel: interrupting MD-thread pid 6 
Sep 15 06:00:29 dru1a kernel: raid1: total working disks is 1. disk[0].operation
al=1 disk[1].operational=0 
Sep 15 06:00:29 dru1a kernel: raid1: only one disk left and IO error. 
Sep 15 06:00:30 dru1a kernel: scsi0 channel 0 : resetting for second half of ret
ries. 
Sep 15 06:00:30 dru1a kernel: SCSI bus is being reset for host 0 channel 0. 

...
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More on uninterrupted resync

Reply via email to