Re: raid1_error() broken?

Corin Hartland-Swann Tue, 03 Apr 2001 13:26:02 -0700

Hi there,

I was just clearing out a backlog of e-mail and noticed that nobody
commented on this post - it's out of my league but I was wondering if
anybody can shed any light on it...

Neal? Alvin?

Regards,

Corin

/------------------------+-------------------------------------\
| Corin Hartland-Swann   |    Tel: +44 (0) 20 7491 2000        |
| Commerce Internet Ltd  |    Fax: +44 (0) 20 7491 2010        |
| 22 Cavendish Buildings | Mobile: +44 (0) 79 5854 0027        | 
| Gilbert Street         |                                     |
| Mayfair                |    Web: http://www.commerce.uk.net/ |
| London W1K 5HJ         | E-Mail: [EMAIL PROTECTED]        |
\------------------------+-------------------------------------/

On Tue, 20 Mar 2001, Richard Hirst wrote:
>   I've just started playing with raid, so I may not have fully understood
> what is happening here, but..
> 
> Look at raid1_error:
> 
> static int raid1_error (mddev_t *mddev, kdev_t dev)
> {
>         raid1_conf_t *conf = mddev_to_conf(mddev);
>         struct mirror_info * mirrors = conf->mirrors;
>         struct mirror_info *tmp;
>         int disks = MD_SB_DISKS;
>         int i;
> 
>         if (conf->working_disks == 1) {
>                 /*
>                  * Uh oh, we can do nothing if this is our last disk, but
>                  * first check if this is a queued request for a device
>                  * which has just failed.
>                  */
>                 for (i = 0; i < disks; i++) {
>                         if (mirrors[i].dev==dev && !mirrors[i].operational)
>                                 return 0;
>                 }
>                 printk (LAST_DISK);
>         } else {
>                 /*
>                  * Mark disk as unusable
>                  */
>                 for (i = 0; i < disks; i++) {
>                         if (mirrors[i].dev==dev && mirrors[i].operational) {
>                                 mark_disk_bad(mddev, i);
>                                 break;
>                         }
>                 }
>         }
>         return 0;
> }
> 
> Now consider a two disk mirror where one disk is ok and the other has
> just been hotadded back in.  That disk is marked 'spare' and
> 'operational' while it is being brought back in to sync.
> 
> conf->working_disks is still 1, because it isn't updated until the sync
> is complete.
> 
> Now we get an error on the new disk; the code does printk (LAST_DISK),
> which is harmless but wrong (the error was not on our last disk).
> 
> Compare this with the case where you have a three disk mirror where two
> disks are ok and the third has just been hotadded back in.
> conf->working_disks is 2.  An error on the new disk will take the
> 'else' path and end up calling mark_disk_bad(), which will decrement
> conf->working_disks to 1.  Oops.
> 
> Comments?


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
Re: raid1_error() broken?

Reply via email to