On Monday, 17 March 2003 at 10:58:28 +0000, Scott Mitchell wrote: > On Mon, Mar 10, 2003 at 11:15:32PM +0000, Scott Mitchell wrote: >> Hi all, >> >> I wonder if anyone out there can shed any light on this: >> >> A drive failed on one of our Vinum-powered RAID-5 arrays over the weekend. >> This morning, we swapped out the offending drive (hot-swappable SCSI >> hardware), disklabel-ed it and restarted the offending subdisk. Everything >> seemed fine at this point, with vinum happily reviving the stale subdisk. >> >> However, twenty minutes later, with the revive 29% complete, I got this in >> /var/log/messages: >> >> Mar 10 11:39:50 kokako vinum[12708]: can't revive raid.p0.s0: Invalid argument >> >> 'vinum list' was also showing an error message, which I foolishly didn't >> capture, something along the lines of 'the revive process died'. Lacking >> any better ideas, I started the subdisk again. The revival seemed to pick >> up where it left off. >> >> Half an hour later, the box rebooted :-( I wasn't actually watching it at >> the time, so I don't know if it finished reviving the subdisk or not. >> There's no indication in the logs as to what happened, but the timing of >> the reboot is consistent with it happening around the time the subdisk >> would have come back to life. >> >> Once the box came back up, I restarted the subdisk yet again (I had to >> create the drive again first), with the RAID volume unmounted. This time >> the process finished without complaints and things seem to be working as >> well as ever since then. > [logs, etc. snipped...] > > > No takers?
I've been intending to do so, but there's not much I can do based on the information you've supplied. > Maybe someone who's done this (replacing a failed Vinum drive on > hot-swap SCSI hardware) before can at least tell me whether: > > - I should have done some camcontrol magic before rebuilding > the drive? I can't see anything in particular you would need to do, but then I haven't seen the details. > - Rebuilding the drive without unmounting the volume first was > just asking for trouble? There have been reports of this kind of problem, mainly from Vallo Kallaste, who has also responded. I haven't seen it myself, and I haven't heard of panics as a result. But yes, umounting is a good precaution. > - -hackers or even -stable is a better venue for this kind of problem? -questions will do fine. Greg -- When replying to this message, please copy the original recipients. If you don't, I may ignore the reply or reply to the original recipients. For more information, see http://www.lemis.com/questions.html See complete headers for address and phone numbers
pgp00000.pgp
Description: PGP signature