"Kenneth Waegeman" <kenneth.waege...@ugent.be> wrote:
> Currently our file system is down due to down/unrecovered disks. We
> try to start the disks again with mmchdisk, but when we do this, we
> see this error in our mmfs.log:
> ...
> This is a 3-way replicated vdisk, and not one of the recovering disks, but
> this disk is in 'up' state..
First, please open a PMR through your normal support organization, and make it clear in the PMR that the problem is GPFS and GNR (a.k.a. ESS).  Like that, it will be assigned to the correct support group.  Support will request that you upload a snap.
There seems to be a combination of two problems here:
One, a NSD (which is also a GNR vdisk) is down, which is usually caused by an IO error on the vdisk, or by both servers for the recovery group that contains the vdisk being down simultaneously.  Usually, that is easily fixed by running mmchdisk with a start option, but you tried that and it didn't work.  This problem is at the NSD layer (meaning in the GPFS client that accesses the GNR vdisk), not in the GNR layer.
Second, another vdisk has an internal error, caused by read error from the physical disks (which is what "uncorrectable read error" means).  Now, give that you say that this vdisk is 3-way replicated, that probably means that there are multiple problems.  This error is purely in the GNR layer, and the error message you quote "smallRead VIO..." comes from the GNR layer.  Now, an error from one vdisk can't prevent mmchdisk on a different vdisk from working, so these two problems seem unrelated.
Furthermore, I'm going to bet that the two problems (which at first seem unrelated) must in reality have a common root cause; it would be too weird a coincidence to get two problems that are unrelated at the same time.  To debug this requires looking at way more information than a single line from the mmfs.log file, which is why the support organization needs a complete PMR opened, and then have the usual snap (with logs, dumps, ...) uploaded, so it can see what the cause of the problem is.
Good luck!
Ralph Becker-Szendy
IBM Almaden Research Center - Computer Science -Storage Systems
650 Harry Road, K56-B3, San Jose, CA 95120

