Platform:

  - old dell workstation with an Andataco gigaraid enclosure 
    plugged into an Adaptec 39160
  - Nevada b51

Current zpool config:

   - one two-disk mirror with two hot spares

In my ferocious pounding of ZFS I've managed to corrupt my data
pool. This is what I've been doing to test it:

   - set zil_disable to 1 in /etc/system
   - continually untar a couple of files into the filesystem
   - manually spin down a drive in the mirror by holding down
     the button on the enclosure
   - for any system hangs reboot with a nasty

          reboot -dnq

I've gotten different results after the spindown:

   - works properly: short or no hang, hot spare successfully 
      added to the mirror
   - system hangs, and after a reboot the spare is not added
   - tar hangs, but after running "zpool status" the hot
      spare is added properly and tar continues
   - tar continues, but hangs on "zpool status"

The last is what happened just prior to the corruption. Here's the output
of zpool status:

nextest-01# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 1 errors on Thu Nov 30 11:37:21 2006
config:

        NAME        STATE     READ WRITE CKSUM
        zmir        DEGRADED     8     0     4
          mirror    DEGRADED     8     0     4
            c3t3d0  ONLINE       0     0    24
            c3t4d0  UNAVAIL      0     0     0  cannot open
        spares
          c0t0d0    AVAIL
          c3t1d0    AVAIL

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          15       0       lvl=4294967295 blkid=0

So the questions are:

  - is this fixable? I don't see an inum I could run find on to remove, 
    and I can't even do a zfs volinit anyway:

        nextest-01# zfs volinit
        cannot iterate filesystems: I/O error

   - would not enabling zil_disable have prevented this?

   - Should I have been doing a 3-way mirror?

   - Is there a more optimum configuration to help prevent this
      kind of corruption?

Ultimately, I want to build a ZFS server with performance and reliability
comparable to say, a Netapp, but the fact that I appear to have been
able to nuke my pool by simulating a hardware error gives me pause. 

I'd love to know if I'm off-base in my worries.

Jim
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to