[discuss] zpool problems after disk swap from SATA to SAS expansion card

Hugh McIntyre Wed, 23 Dec 2020 09:46:13 -0800

Update on the problem below. The issue seems to have been because ofswapping between direct-attach SATA to/from a LSI SAS expansion card,which changes the device path causing "invalid vdev configuration" errors.

The original disk is now back, but in swapping disks to fix this I nowhave a raidz pool reporting FAULTED for one of the disks that moved fromdirect-attach to behind the SAS card. Notice below both pool14a(working) and the faulted disk in raidpool both report the disk name as"c11t0d0".

Is there a clean way to fix the disk in raidpool? I tried "zpoolreplace raidpool c11t0d0 c1t5000C500C6E6F681d0" with and without "-f",but this fails, for example:


    # zpool replace -f raidpool c11t0d0 c1t5000C500C6E6F681d0
    invalid vdev specification
    the following errors must be manually repaired:
    /dev/dsk/c1t5000C500C6E6F681d0s0 is part of active ZFS pool
    raidpool. Please see zpool(1M).

Obviously I know it's supposed to be part of this "raidpool", but Icannot detach. I am reluctant to try to nuke the disk label, but it'snot clear which option would work.


FYI, the errors in /var/adm/messages seem to be:

Dec 23 08:56:57 zbackup zfs: [ID 101897 kern.notice] NOTICE:vdev_disk_open /dev/dsk/c11t0d0s0: update devid from 'id1,sd@SATA_____ST4000VN008-2DR1____________ZGY7RTPN/a' to'id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/a'Dec 23 08:56:57 zbackup zfs: [ID 844310 kern.notice] NOTICE:vdev_disk_open /dev/dsk/c11t0d0s0: devid mismatch: id1,sd@SATA_____ST4000VN008-2DR1____________ZGY7RTPN/a !=id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/aDec 23 08:56:57 zbackup zfs: [ID 101897 kern.notice] NOTICE:vdev_disk_open /dev/dsk/c11t0d0s0: update devid from 'id1,sd@SATA_____ST4000VN008-2DR1____________ZGY7RTPN/a' to'id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/a'



Thanks,

Hugh.

# zpool status
  pool: pool14a
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool14a     ONLINE       0     0     0
          c11t0d0   ONLINE       0     0     0

errors: No known data errors

  pool: raidpool
 state: DEGRADED
status: One or more devices could not be used because the label is
        missing or invalid.  Sufficient replicas exist for the pool
        to continue functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-4J
  scan: scrub in progress since Wed Dec 23 08:57:06 2020 <...>
config:

        NAME                       STATE     READ WRITE CKSUM
        raidpool                   DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            c1t50014EE2B4ED7831d0  ONLINE       0     0     0

c11t0d0 FAULTED 0 0 0corrupted data

            c11t4d0                ONLINE       0     0     0

errors: No known data errors




On 12/21/20 5:45 PM, Hugh McIntyre wrote:

I have a single-disk zpool that was temporarily removed without "zpoolexport" to make space for some reallocation to a new disk. The systemwas cleanly shut down before removal, but not exported. This disk wasusing a device name such as "c11t0d0".
Subsequently, while writing to a new pool in the same disk position (buta new disk and pool name), the system started to generate datacorruption errors for the removed pool, such as "pool10a: disk c11t0d0:corrupt data" (wording may not match). So I tried to make the systemstop trying to access this pool, with "zpool destroy -f pool10a" andthis stopped the errors.
The problem, is that now the pool will not re-import because zpoolimport says:
    pool: pool10a
      id: 11135625420108541132
   state: UNAVAIL
  status: One or more devices contains corrupted data.
  action: The pool cannot be imported due to damaged devices or data.
    see: http://illumos.org/msg/ZFS-8000-5E
  config:

         pool10a                  UNAVAIL  insufficient replicas
           c1t5000CCA266D6BC8Bd0  UNAVAIL  corrupted data
It seems unlikely the disk is actually corrupt because it was physicallydetached. At least unless the system has written some cached data afterreattachment, which also seems unlikely because it's not imported.
Is this an issue with zpool.cache or some other saved state, and if sois this fixable? There have been other changes since this pool was lastmounted so I don't think I can just use an old cache file. Are thereother options, such as rebooting with the cache file nuked?
If this is not fixable I will nuke and re-create, but I would prefer toget the disk back as-is if this is possible.
Thanks,

Hugh.


------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T7c4b45515c406b63-Medb89cb0572960716c804392
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

[discuss] zpool problems after disk swap from SATA to SAS expansion card

Reply via email to