So came to light because I was checking the mmbackup logs and found that we had not been getting any successful backups for several days and seeing lots of errors like this

Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: [E] Error on gpfs_iopen([/gpfs/users/xxxyyyyy/.swr],68050746): Stale file handle Wed Jun 19 21:45:28 2024 mmbackup:Error encountered in policy scan: [E] Summary of errors:: _dirscan failures:3, _serious unclassified errors:3.

After some digging around wondering what was going on I came across these being logged on one of the DSS-G nodes

[Wed Jun 12 22:22:05 2024] blk_update_request: I/O error, dev sdbv, sector 9144672512 op 0x1:(WRITE) flags 0x700 phys_seg 17 prio class 0

Yikes looks like I have a failed disk/ However if I do

[root@gpfs2 ~]# mmvdisk pdisk list --recovery-group all --not-ok
mmvdisk: All pdisks are ok.

Clearly that's a load of rubbish.

After a lot more prodding

[root@gpfs2 ~]# mmvdisk pdisk list --recovery-group dssg2 --pdisk e1d2s25 -L
pdisk:
   replacementPriority = 1000
   name = "e1d2s25"
device = "//gpfs1/dev/sdft(notEnabled),//gpfs1/dev/sdfu(notEnabled),//gpfs2/dev/sdfb,//gpfs2/dev/sdbv"
   recoveryGroup = "dssg2"
   declusteredArray = "DA1"
   state = "ok"
   IOErrors = 444
   IOTimeouts = 8958
   mediaErrors = 15


What on earth gives? Why has the disk not been failed? It's not great that a clearly bad disk is allowed to stick around in the file system and cause problems IMHO.

When I try and prepare the disk for removal I get

[root@gpfs2 ~]# mmvdisk pdisk replace --prepare --rg dssg2 --pdisk e1d2s25
mmvdisk: Pdisk e1d2s25 of recovery group dssg2 is not currently scheduled for replacement.
mmvdisk:
mmvdisk:
mmvdisk: Command failed. Examine previous error messages to determine cause.

Do I have to use the --force option? I would like to get this disk out the file system ASAP.



JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to