Hi All, We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems:
1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I’ve unmounted across the cluster because it has data replication set to 1. 2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we’re doing is “mmchdisk gpfs22 suspend -d <the gpfs22 NSD>”, then doing the firmware upgrade, and once the array is back we’re doing a “mmchdisk gpfs22 resume -d <NSD>”, followed by “mmchdisk gpfs22 start -d <NSD>”. On the 1st storage array this went very smoothly … the mmchdisk took about 5 minutes, which is what I would expect. But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it’s been stuck at: mmchdisk: Processing continues ... Scanning file system metadata, phase 1 … There are no waiters of any significance and “mmdiag —iohist” doesn’t show any issues either. Any ideas, anyone? Unless I can figure this out I’m hosed for this downtime, as I’ve got 7 more arrays to do after this one! Thanks! — Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education [email protected]<mailto:[email protected]> - (615)875-9633
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
