Update on the problem below. The issue seems to have been because of swapping between direct-attach SATA to/from a LSI SAS expansion card, which changes the device path causing "invalid vdev configuration" errors.

The original disk is now back, but in swapping disks to fix this I now have a raidz pool reporting FAULTED for one of the disks that moved from direct-attach to behind the SAS card. Notice below both pool14a (working) and the faulted disk in raidpool both report the disk name as "c11t0d0".

Is there a clean way to fix the disk in raidpool? I tried "zpool replace raidpool c11t0d0 c1t5000C500C6E6F681d0" with and without "-f", but this fails, for example:

    # zpool replace -f raidpool c11t0d0 c1t5000C500C6E6F681d0
    invalid vdev specification
    the following errors must be manually repaired:
    /dev/dsk/c1t5000C500C6E6F681d0s0 is part of active ZFS pool
    raidpool. Please see zpool(1M).

Obviously I know it's supposed to be part of this "raidpool", but I cannot detach. I am reluctant to try to nuke the disk label, but it's not clear which option would work.

FYI, the errors in /var/adm/messages seem to be:

Dec 23 08:56:57 zbackup zfs: [ID 101897 kern.notice] NOTICE: vdev_disk_open /dev/dsk/c11t0d0s0: update devid from 'id1,sd@SATA_____ST40 00VN008-2DR1____________ZGY7RTPN/a' to 'id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/a' Dec 23 08:56:57 zbackup zfs: [ID 844310 kern.notice] NOTICE: vdev_disk_open /dev/dsk/c11t0d0s0: devid mismatch: id1,sd@SATA_____ST4000V N008-2DR1____________ZGY7RTPN/a != id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/a Dec 23 08:56:57 zbackup zfs: [ID 101897 kern.notice] NOTICE: vdev_disk_open /dev/dsk/c11t0d0s0: update devid from 'id1,sd@SATA_____ST40 00VN008-2DR1____________ZGY7RTPN/a' to 'id1,sd@SATA_____ST14000NM001G-2K____________ZL20DAZ9/a'


Thanks,

Hugh.

# zpool status
  pool: pool14a
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool14a     ONLINE       0     0     0
          c11t0d0   ONLINE       0     0     0

errors: No known data errors

  pool: raidpool
 state: DEGRADED
status: One or more devices could not be used because the label is
        missing or invalid.  Sufficient replicas exist for the pool
        to continue functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-4J
  scan: scrub in progress since Wed Dec 23 08:57:06 2020 <...>
config:

        NAME                       STATE     READ WRITE CKSUM
        raidpool                   DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            c1t50014EE2B4ED7831d0  ONLINE       0     0     0
c11t0d0 FAULTED 0 0 0 corrupted data
            c11t4d0                ONLINE       0     0     0

errors: No known data errors




On 12/21/20 5:45 PM, Hugh McIntyre wrote:

I have a single-disk zpool that was temporarily removed without "zpool export" to make space for some reallocation to a new disk.  The system was cleanly shut down before removal, but not exported.  This disk was using a device name such as "c11t0d0".

Subsequently, while writing to a new pool in the same disk position (but a new disk and pool name), the system started to generate data corruption errors for the removed pool, such as "pool10a: disk c11t0d0: corrupt data" (wording may not match).  So I tried to make the system stop trying to access this pool, with "zpool destroy -f pool10a" and this stopped the errors.

The problem, is that now the pool will not re-import because zpool import says:

    pool: pool10a
      id: 11135625420108541132
   state: UNAVAIL
  status: One or more devices contains corrupted data.
  action: The pool cannot be imported due to damaged devices or data.
    see: http://illumos.org/msg/ZFS-8000-5E
  config:

         pool10a                  UNAVAIL  insufficient replicas
           c1t5000CCA266D6BC8Bd0  UNAVAIL  corrupted data

It seems unlikely the disk is actually corrupt because it was physically detached.  At least unless the system has written some cached data after reattachment, which also seems unlikely because it's not imported.

Is this an issue with zpool.cache or some other saved state, and if so is this fixable?  There have been other changes since this pool was last mounted so I don't think I can just use an old cache file.  Are there other options, such as rebooting with the cache file nuked?

If this is not fixable I will nuke and re-create, but I would prefer to get the disk back as-is if this is possible.

Thanks,

Hugh.

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T7c4b45515c406b63-Medb89cb0572960716c804392
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to