Hi, we have Lustre 2.7.58 in place on our OST and MDT/MGS (combined). Underlying the lustre file system is a raid-z2 zfs pool.

A few days ago, we lost 2 disks at once from the raid-z2. I replaced one and a resilver started, that seemed to choke. So, I put back both disks with replacements, and the new re-silver shows the following now.

[root@umdist03 ~]# zpool status -v ost-007
  pool: ost-007
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 972G in 9h25m with 1 errors on Fri Mar 11 19:12:37 2016
config:

        NAME                                  STATE     READ WRITE CKSUM
        ost-007                               DEGRADED     0 0     1
          raidz2-0                            DEGRADED     0 0     4
            replacing-0                       DEGRADED     0 0     0
18280868502819750645 UNAVAIL 0 0 0 was /dev/disk/by-path/pci-0000:0c:00.0-scsi-0:2:20:0-part1/old
              pci-0000:0c:00.0-scsi-0:2:20:0  ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:21:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:22:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:23:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:24:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:35:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:36:0    ONLINE       1 0     0
            pci-0000:0c:00.0-scsi-0:2:37:0    ONLINE       0 0     0
            pci-0000:0c:00.0-scsi-0:2:38:0    ONLINE       0 0     0
            replacing-9                       UNAVAIL      0 0     0
14369532488179106769 UNAVAIL 0 0 0 was /dev/disk/by-path/pci-0000:0c:00.0-scsi-0:2:39:0-part1/old
              pci-0000:0c:00.0-scsi-0:2:39:0  ONLINE       0 0     0

errors: Permanent errors have been detected in the following files:

        ost-007/ost0030:<0x2c90f>

what are my options here? If I don't care about the file, can I identify it and then just delete it? Or is my only real option to drain the pool and rebuild it cleanly?

Thanks for any help/advice.

bob
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to