Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster?
Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek <[email protected]> To: [email protected] Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: [email protected] Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root@gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root@gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root@gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root@gpfs-n4 /]# hostname gpfs-n4 [root@gpfs-n4 /]# hostname -f gpfs-n4.img.local [root@gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root@gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root@gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root@gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
