Can you see the block devices from inside the OS after the reboot?  I don't 
see where you mention this.  How is the storage attached to the server?  As a 
DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All 
nsds in same failure group?     I was quickly brought to mind of a failed 
SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope 
you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, 
dm-something else another), so that is no problem.  But wacking the volume 
label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer 
 /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: [email protected] [[email protected]] on 
behalf of Jared David Baker [[email protected]]
Sent: Wednesday, October 29, 2014 11:31 AM
To: [email protected]
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I’m hoping that somebody can shed some light on a problem that I experienced 
yesterday. I’ve been working with GPFS for a couple months as an admin now, but 
I’ve come across a problem that I’m unable to see the answer to. Hopefully the 
solution is not listed somewhere blatantly on the web, but I spent a fair 
amount of time looking last night. Here is the situation: yesterday, I needed 
to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS 
servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our 
main campus cluster. However, upon reboot, the server seemed to lose the path 
mappings to the multipath devices for the NSDs. Output below:

--
[root@mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           
(not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           
(not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           
(not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           
(not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           
(not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           
(not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           
(not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           
(not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           
(not found) server node

[root@mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I’m 
unable to mount the GPFS filesystem. The disk names look like they are there 
and mapped to the NSD volume ID, but there is no Device. I’ve created the 
/var/mmfs/etc/nsddevices script and it has the following output with user 
return 0:

--
[root@mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root@mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging 
in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such 
NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such 
NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such 
NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No 
such NSD locally found.
--

Okay, so the NSDs don’t seem to be able to be found, so I attempt to rediscover 
the NSD by executing the command mmnsddiscover:

--
[root@mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root@mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root@mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name               
 Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini     
      (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini     
      (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini     
      (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini     
      (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini      
     (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini      
     (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini      
     (not found) server node

[root@mmmnsd5 ~]#
--

I’m wondering if somebody has seen this type of issue before? Will recreating 
my NSDs destroy the filesystem? I’m thinking that all the data is intact, but 
there is no crucial data on this file system yet, so I could recreate the file 
system, but I would like to learn how to solve a problem like this. Thanks for 
all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to