> -----Original Message-----
> From: Bruno Prior [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 22, 1999 8:30 AM
> To: Linux-Raid; Michel Pelletier
> Subject: RE: errors on boot
>
>
> Michel,
>
> Thanks for that. It makes things much clearer.
>
> > Nov 16 13:36:28 korak kernel: sdb4's event counter: 00000016
> > Nov 16 13:36:28 korak kernel: sda4's event counter: 00000017
> > Nov 16 13:36:28 korak kernel: md: superblock update time
> inconsistency
> > -- using
> > the most recent one
> > Nov 16 13:36:28 korak kernel: freshest: sda4
> > Nov 16 13:36:28 korak kernel: md2: kicking faulty sdb4!
>
> This is the crucial part. I think it speaks for itself. The
> mirrors are out of
> sync, so the RAID code assumes that there is a problem and
> kicks the partition
> that was updated less recently (sdb4) out of the array. Did
> you have an unclean
> shutdown, or was there a problem with the RAID before the
> last shutdown such
> that sdb4 was kicked out of the array? Something like this
> must have happened.
Yes, before shutdown we were getting errors on the console about
'writing beyond the end of a device'. This is why I rebooted the
machine.
> Anyway, the solution is very simple. Just do "raidhotadd
> /dev/md2 /dev/sdb4".
> You don't need to "raidhotremove /dev/md2 /dev/sdb4", because
> it has already
> been kicked out of the array at startup. This will add
> /dev/sdb4 back into the
> array. The RAID code should start resyncing it automatically
> once it has been
> added back in. Have a look in /proc/mdstat and you should see
> how the resync'ing
> process is going. Make sure you don't shutdown before
> resync'ing has completed,
> or you will be back to square one. But you can use the array
> quite happily while
> it is resyncing.
You the man! Worked like a charm.
> It would be a good idea to try to figure out why the mirrors
> were out of sync,
> in case this reveals a problem. If it was an unclean
> shutdown, then there's no
> problem (apart from making sure you don't do it again). But
> if sdb4 had been
> kicked out of the array, you need to know why to make sure it
> doesn't happen
> again. If this was the case, you will need to check back
> through your syslog to
> try to spot when it happened and what the reasons were. Or if
> you can't be
> bothered to do this, at least keep an eye on /proc/mdstat
> from now on (maybe
> using one of the monitoring scripts which are mentioned on
> this list from time
> to time), to make sure that you know if it happens again.
Well I suspect it's got to do with the writing beyond the end of the
device error, which sounds like something I could have no control over.
However, there are several people here who do install a bunch of
software and generally muck about with the machine who I'll restrain
from now on. I'll tool through dmesg and syslogs to see if I can glean
more than that. Thanks alot for your help!
-Michel