Hi

FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: Tue Feb 13 18:24:34 CET 2007 [EMAIL PROTECTED]:/usr/obj/usr/ src/sys/ROUTER.POLLING i386

(ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, IPSEC, also pfsync and carp)

This weekend I had a disk failing on me in a machine running gmirror gm0 with 2 providers (ad0 and ad6). The whole box froze with no screen output, and on hard reboot I got some LBA errors etc from ad0, after a few reboots it got up and running though (I wasnt at the screen, had do do it by phone so couldn't really debug very well). As soon as the box got up, I removed ad0 from the gmirror, so ad6 was the only provider. Today I got a new disk that would replace ad0.. Now remeber, ad6 was the only disk in the mirror. I took the box down fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. Okay, there came the first problem; the boot loader gave me the usual options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 i got the same prompt again.. F5 nothing at all.. Funny!... The system refused to load the loader (or whatever the 1-9 menu thingy is called) kernel or anything.. So I finally plugged the old ad0 disk into the machine to at least get it booted, thinking it would go up on the gmirror.. Nope..:

(got the new ad4 out here)
ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100
ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150
GEOM_MIRROR: Device gm0 created (id=4029378995).
GEOM_MIRROR: Device gm0: provider ad6 detected.
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
GEOM_MIRROR: Force device gm0 start due to timeout.
Trying to mount root from ufs:/dev/mirror/gm0s1a

Manual root filesystem specification:
  <fstype>:<device>  Mount <device> using filesystem <fstype>
                       eg. ufs:da0s1a
  ?                  List valid disk boot devices
  <empty line>       Abort manual input

mountroot>

Okey... so why wouldnt it load my mirror from ad6 now?? I just did a clean shutdown without problems.. It didnt even recognize any slices on ad6s1 (altough the ad6s1 was found)... I entered ad0s1 as root and booted from there, ofcourse i got to emergency shell since fstab looked for the gmirror devices, which didnt exist..

Some more digging into gmirror, I did a gmirror dump ad6:

Metadata on /dev/ad6:
     magic: GEOM::MIRROR
   version: 3
      name: gm0
       mid: 4029378995
       did: 449032193
       all: 3
     genid: 0
    syncid: 5
  priority: 0
     slice: 4096
   balance: round-robin
mediasize: 20416757248
sectorsize: 512
syncoffset: 0
    mflags: NONE
    dflags: SYNCHRONIZING
hcprovider:
  provsize: 160041885696
  MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f

Some googling indicated that SYNCHRONIZING means that its not "complete" and wont mount? Is that correct? Why would it be in that state then, I just shut it down fine... And where the f*ck did my slices go??..

Did a sysctl kern.geom.mirror.debug=2 and tried to gmirror activate the mirror:

GEOM_MIRROR[1]: Creating device gm0 (id=4029378995).
GEOM_MIRROR[0]: Device gm0 created (id=4029378995).
GEOM_MIRROR[1]: root_mount_hold 0xc3539510
GEOM_MIRROR[1]: Adding disk ad6 to gm0.
GEOM_MIRROR[2]: Adding disk ad6.
GEOM_MIRROR[2]: Disk ad6 connected.
GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0).
GEOM_MIRROR[0]: Device gm0: provider ad6 detected.
GEOM_MIRROR[2]: Tasting ad6s1.
GEOM_MIRROR[0]: Force device gm0 start due to timeout.
GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510
GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed.
GEOM_MIRROR[2]: Metadata on ad6 updated.
GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 = 0
GEOM_MIRROR[0]: Device gm0 destroyed.
GEOM_MIRROR[1]: Thread exiting.
GEOM_MIRROR[1]: Consumer ad6 destroyed.


Soo.. What is going on here? Anyone with some clues? Currently running on the ad0 disk, no raid at all.. Lets hope it doesnt die on me (havent had any signs of that since sunday when it froze and gave boot errors now so I'm hoping..). The data loss from using ad0 instead of ad6 is probably minimal, its a router so its more or less only logging that seems to been lost... For now I just want to get clear about wth happened here and how to prevent it, and how to get back up on a gmirror with ad6 and ad4 (to be plugged in) so I can throw ad0 out...


Thanks

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to