Hi there, Not sure if this is a known bug (or even if it's a bug at all), but zfs seems to get confused when several consecutive temporary disk faults occur involving a hot spare. I couldn't find anything related to this on this forum, so here goes:
I'm testing this on a SunBlade 2000 hooked up to a T3 via STMS. The OS version is snv48. This is a bit confusing, so bear with me. Basically, the problem occurs when the following happens: - a pool is created with a hot spare - a data disk is faulted (so that the spare steps in) - the data disk is brought back online - the hot spare is faulted - the hot spare is brought back online and detached from the pool (to stop it from acting as a spare for the data disc that faulted) - the original data disc is faulted again When the above takes place, the spare ends up replacing the data disc completely in the pool but it still shows up as a spare. This occurs with mirror, raidz1 and raidz2 volumes. On another note, when a disk is faulted the console output says "AUTO-RESPONSE: No automated response will occur." - shouldn't this mention that a hot spare action will happen? Here's a walkthrough with a 2-way mirror (I'm 'faulting' the discs by making them invisible to the host using the T3's LUN masking, then bringing them back by making them visible again): ***************** ***create pool*** ***************** [EMAIL PROTECTED] zpool create tank mirror c5t60020F2000000A78450A91BE00088501d0 c5t60020F2000000A78450A918D0003BA4Ad0 spare c5t60020F2000000A7845098A27000B9ED2d0 [EMAIL PROTECTED] [EMAIL PROTECTED] zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 c5t60020F2000000A78450A918D0003BA4Ad0 ONLINE 0 0 0 spares c5t60020F2000000A7845098A27000B9ED2d0 AVAIL errors: No known data errors **************************************** ***fault a data disc (bring spare in)*** **************************************** t3f1:/:<161>lun perm lun 4 none grp v4u2000a <console output> SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Sep 21 11:45:13 BST 2006 PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 3eef63b6-061e-6039-e273-e06c9feb8475 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. [EMAIL PROTECTED] zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Thu Sep 21 11:45:14 2006 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c5t60020F2000000A78450A918D0003BA4Ad0 UNAVAIL 0 62 0 cannot open c5t60020F2000000A7845098A27000B9ED2d0 ONLINE 0 0 0 spares c5t60020F2000000A7845098A27000B9ED2d0 INUSE currently in use errors: No known data errors ************************************ *** Bring data disc back online *** ************************************ t3f1:/:<162>lun perm lun 4 rw grp v4u2000a [EMAIL PROTECTED] zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 spare ONLINE 0 0 0 c5t60020F2000000A78450A918D0003BA4Ad0 ONLINE 0 62 0 c5t60020F2000000A7845098A27000B9ED2d0 ONLINE 0 0 0 spares c5t60020F2000000A7845098A27000B9ED2d0 INUSE currently in use errors: No known data errors **************************** *** Fault the spare disc *** **************************** t3f1:/:<163>lun perm lun 1 none grp v4u2000a <console output> SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Sep 21 11:51:24 BST 2006 PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 9a80c89d-6633-e9ae-8315-d632cdb12406 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. [EMAIL PROTECTED] zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c5t60020F2000000A78450A918D0003BA4Ad0 ONLINE 0 62 0 c5t60020F2000000A7845098A27000B9ED2d0 UNAVAIL 0 62 0 cannot open spares c5t60020F2000000A7845098A27000B9ED2d0 INUSE currently in use errors: No known data errors ******************************************* *** Reconnect and detach the spare disc *** ******************************************* t3f1:/:<164>lun perm lun 1 rw grp v4u2000a [EMAIL PROTECTED] zpool detach tank c5t60020F2000000A7845098A27000B9ED2d0 [EMAIL PROTECTED] [EMAIL PROTECTED] zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Thu Sep 21 11:48:26 2006 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 c5t60020F2000000A78450A918D0003BA4Ad0 ONLINE 0 62 0 spares c5t60020F2000000A7845098A27000B9ED2d0 UNAVAIL cannot open errors: No known data errors ************************************* *** Fault the original disc again *** ************************************* t3f1:/:<165>lun perm lun 4 none grp v4u2000a <console output> SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Sep 21 11:59:31 BST 2006 PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: v4u-2000a-gmp03 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: d7c4ffa3-e7d3-41a8-cfbe-eecccb4bbe72 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. [EMAIL PROTECTED] zpool status pool: tank state: ONLINE scrub: resilver completed with 0 errors on Thu Sep 21 11:59:32 2006 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t60020F2000000A78450A91BE00088501d0 ONLINE 0 0 0 c5t60020F2000000A7845098A27000B9ED2d0 ONLINE 0 0 0 spares c5t60020F2000000A7845098A27000B9ED2d0 UNAVAIL cannot open errors: No known data errors The faulted data disc disappears completely and the spare takes its place, but the spare still shows up as a spare! Am I just misunderstanding what is intended behaviour, or is something amiss here? Cheers, Liam This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss