Hi there,

Not sure if this is a known bug (or even if it's a bug at all), but zfs seems 
to get confused when several consecutive temporary disk faults occur involving 
a hot spare. I couldn't find anything related to this on this forum, so here 
goes:

I'm testing this on a SunBlade 2000 hooked up to a T3 via STMS. The OS version 
is snv48.

This is a bit confusing, so bear with me. Basically, the problem occurs when 
the following happens:

- a pool is created with a hot spare
- a data disk is faulted (so that the spare steps in)
- the data disk is brought back online
- the hot spare is faulted
- the hot spare is brought back online and detached from the pool (to stop it 
from acting as a spare for the data disc that faulted)
- the original data disc is faulted again

When the above takes place, the spare ends up replacing the data disc 
completely in the pool but it still shows up as a spare. This occurs with 
mirror, raidz1 and raidz2 volumes.

On another note, when a disk is faulted the console output says "AUTO-RESPONSE: 
No automated response will occur." - shouldn't this mention that a hot spare 
action will happen?



Here's a walkthrough with a 2-way mirror (I'm 'faulting' the discs by making 
them invisible to the host using the T3's LUN masking, then bringing them back 
by making them visible again):

*****************
***create pool***
*****************

[EMAIL PROTECTED] zpool create tank mirror 
c5t60020F2000000A78450A91BE00088501d0 c5t60020F2000000A78450A918D0003BA4Ad0 
spare c5t60020F2000000A7845098A27000B9ED2d0
[EMAIL PROTECTED]
[EMAIL PROTECTED] zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME                                       STATE     READ WRITE CKSUM
        tank                                       ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c5t60020F2000000A78450A91BE00088501d0  ONLINE       0     0     0
            c5t60020F2000000A78450A918D0003BA4Ad0  ONLINE       0     0     0
        spares
          c5t60020F2000000A7845098A27000B9ED2d0    AVAIL

errors: No known data errors



****************************************
***fault a data disc (bring spare in)***
****************************************

t3f1:/:<161>lun perm lun 4 none grp v4u2000a

<console output>
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Sep 21 11:45:13 BST 2006
PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 3eef63b6-061e-6039-e273-e06c9feb8475
DESC: A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for more 
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.

[EMAIL PROTECTED] zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Thu Sep 21 11:45:14 2006
config:

        NAME                                         STATE     READ WRITE CKSUM
        tank                                         DEGRADED     0     0     0
          mirror                                     DEGRADED     0     0     0
            c5t60020F2000000A78450A91BE00088501d0    ONLINE       0     0     0
            spare                                    DEGRADED     0     0     0
              c5t60020F2000000A78450A918D0003BA4Ad0  UNAVAIL      0    62     0 
 cannot open
              c5t60020F2000000A7845098A27000B9ED2d0  ONLINE       0     0     0
        spares
          c5t60020F2000000A7845098A27000B9ED2d0      INUSE     currently in use

errors: No known data errors



************************************
*** Bring data disc back online ***
************************************

t3f1:/:<162>lun perm lun 4 rw grp v4u2000a

[EMAIL PROTECTED] zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006
config:

        NAME                                         STATE     READ WRITE CKSUM
        tank                                         ONLINE       0     0     0
          mirror                                     ONLINE       0     0     0
            c5t60020F2000000A78450A91BE00088501d0    ONLINE       0     0     0
            spare                                    ONLINE       0     0     0
              c5t60020F2000000A78450A918D0003BA4Ad0  ONLINE       0    62     0
              c5t60020F2000000A7845098A27000B9ED2d0  ONLINE       0     0     0
        spares
          c5t60020F2000000A7845098A27000B9ED2d0      INUSE     currently in use

errors: No known data errors



****************************
*** Fault the spare disc ***
****************************

t3f1:/:<163>lun perm lun 1 none grp v4u2000a

<console output>
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Sep 21 11:51:24 BST 2006
PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 9a80c89d-6633-e9ae-8315-d632cdb12406
DESC: A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for more 
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.

[EMAIL PROTECTED] zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006
config:

        NAME                                         STATE     READ WRITE CKSUM
        tank                                         DEGRADED     0     0     0
          mirror                                     DEGRADED     0     0     0
            c5t60020F2000000A78450A91BE00088501d0    ONLINE       0     0     0
            spare                                    DEGRADED     0     0     0
              c5t60020F2000000A78450A918D0003BA4Ad0  ONLINE       0    62     0
              c5t60020F2000000A7845098A27000B9ED2d0  UNAVAIL      0    62     0 
 cannot open
        spares
          c5t60020F2000000A7845098A27000B9ED2d0      INUSE     currently in use

errors: No known data errors



*******************************************
*** Reconnect and detach the spare disc ***
*******************************************

t3f1:/:<164>lun perm lun 1 rw grp v4u2000a

[EMAIL PROTECTED] zpool detach tank c5t60020F2000000A7845098A27000B9ED2d0
[EMAIL PROTECTED]
[EMAIL PROTECTED] zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006
config:

        NAME                                       STATE     READ WRITE CKSUM
        tank                                       ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c5t60020F2000000A78450A91BE00088501d0  ONLINE       0     0     0
            c5t60020F2000000A78450A918D0003BA4Ad0  ONLINE       0    62     0
        spares
          c5t60020F2000000A7845098A27000B9ED2d0    UNAVAIL   cannot open

errors: No known data errors



*************************************
*** Fault the original disc again ***
*************************************

t3f1:/:<165>lun perm lun 4 none grp v4u2000a

<console output>
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Sep 21 11:59:31 BST 2006
PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: d7c4ffa3-e7d3-41a8-cfbe-eecccb4bbe72
DESC: A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for more 
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.

[EMAIL PROTECTED] zpool status
  pool: tank
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Sep 21 11:59:32 2006
config:

        NAME                                       STATE     READ WRITE CKSUM
        tank                                       ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c5t60020F2000000A78450A91BE00088501d0  ONLINE       0     0     0
            c5t60020F2000000A7845098A27000B9ED2d0  ONLINE       0     0     0
        spares
          c5t60020F2000000A7845098A27000B9ED2d0    UNAVAIL   cannot open

errors: No known data errors



The faulted data disc disappears completely and the spare takes its place, but 
the spare still shows up as a spare!

Am I just misunderstanding what is intended behaviour, or is something amiss 
here?

Cheers,
Liam
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to