Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our 
storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which 
I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set 
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d <the gpfs22 NSD>”, 
then doing the firmware upgrade, and once the array is back we’re doing a 
“mmchdisk gpfs22 resume -d <NSD>”, followed by “mmchdisk gpfs22 start -d <NSD>”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or 
proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any 
issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, 
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to