Hi there,
Not sure if this is a known bug (or even if it's a bug at all), but zfs seems
to get confused when several consecutive temporary disk faults occur involving
a hot spare. I couldn't find anything related to this on this forum, so here
goes:
I'm testing this on a SunBlade 2000 hooked up to a T3 via STMS. The OS version
is snv48.
This is a bit confusing, so bear with me. Basically, the problem occurs when
the following happens:
- a pool is created with a hot spare
- a data disk is faulted (so that the spare steps in)
- the data disk is brought back online
- the hot spare is faulted
- the hot spare is brought back online and detached from the pool (to stop it
from acting as a spare for the data disc that faulted)
- the original data disc is faulted again
When the above takes place, the spare ends up replacing the data disc
completely in the pool but it still shows up as a spare. This occurs with
mirror, raidz1 and raidz2 volumes.
On another note, when a disk is faulted the console output says AUTO-RESPONSE:
No automated response will occur. - shouldn't this mention that a hot spare
action will happen?
Here's a walkthrough with a 2-way mirror (I'm 'faulting' the discs by making
them invisible to the host using the T3's LUN masking, then bringing them back
by making them visible again):
*
***create pool***
*
[EMAIL PROTECTED] zpool create tank mirror
c5t60020F200A78450A91BE00088501d0 c5t60020F200A78450A918D0003BA4Ad0
spare c5t60020F200A7845098A27000B9ED2d0
[EMAIL PROTECTED]
[EMAIL PROTECTED] zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t60020F200A78450A91BE00088501d0 ONLINE 0 0 0
c5t60020F200A78450A918D0003BA4Ad0 ONLINE 0 0 0
spares
c5t60020F200A7845098A27000B9ED2d0AVAIL
errors: No known data errors
***fault a data disc (bring spare in)***
t3f1:/:161lun perm lun 4 none grp v4u2000a
console output
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Sep 21 11:45:13 BST 2006
PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 3eef63b6-061e-6039-e273-e06c9feb8475
DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more
information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.
[EMAIL PROTECTED] zpool status
pool: tank
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: resilver completed with 0 errors on Thu Sep 21 11:45:14 2006
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c5t60020F200A78450A91BE00088501d0ONLINE 0 0 0
spareDEGRADED 0 0 0
c5t60020F200A78450A918D0003BA4Ad0 UNAVAIL 062 0
cannot open
c5t60020F200A7845098A27000B9ED2d0 ONLINE 0 0 0
spares
c5t60020F200A7845098A27000B9ED2d0 INUSE currently in use
errors: No known data errors
*** Bring data disc back online ***
t3f1:/:162lun perm lun 4 rw grp v4u2000a
[EMAIL PROTECTED] zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t60020F200A78450A91BE00088501d0ONLINE 0 0 0
spareONLINE 0 0 0