On Tue, Jan 26, 2010 at 04:03:20PM +0100, Gerrit Kühn wrote: > On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick > <[email protected]> wrote about Re: ZFS "zpool replace" problems: > JC> 2) How did you attach ad18? Did you tell the system about it using > JC> atacontrol? If so, what commands did you use? > > Yes. The drives did not appear automatically (verified with atacontrol > list). Then I first tried reinit ata9, but that did not work out, so I did > a detach/attach for ata9, then the drive was there (with list and also > the device node appeared).
The procedure -- at least on Intel controllers in AHCI mode -- is: - zpool offline <pool> <disk> - atacontrol detach ataX (where X = channel associated with disk) - Physically remove bad disk - Physically insert new disk - Wait 15 seconds for stuff to settle - atacontrol attach ataX (where X = previous channel detached) - zpool replace <pool> <disk> - zpool online <pool> <disk> "reinit" shouldn't be needed at all -- in fact, I've seen reinit cause some craziness (even on Intel controllers), including a system deadlock, but this was back during the RELENG_6 and RELENG_7 days. Great improvements have been made to ata(4) since then. If you need me to validate the above procedure (it's been a while since I've had to hot-swap a disk), I can do so. I do have a 4-disk Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench which I can test with. > Meanwhile I took out the ad18 drive again and tried to use a different > drive. But that was listed as "UNAVAIL" with corrupted data by zfs. > Probably it already branded the disk for resilvering and is looking for > exactly this one now. I also put in the disk which caused the problem > above again. The resilvering process started again, but very soon the > drive got detached again resulting in the same situation I described above. It honestly sounds like hot-swapping is causing some chaos on your system. Are all of the controllers involved configured for AHCI? If not, physical removal/insertion should be done only when the system power is off. If so, mav@ or others may be able to help figure out what's going on in the underlying ata(4) layer. -- | Jeremy Chadwick [email protected] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[email protected]"
