Hi,

I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen.

Look at the names of nodes:

[root@gpfs-n2 ~]# mmlscluster     # Looks good

GPFS cluster information
========================
  GPFS cluster name:         gpfscl1.img.local
  GPFS cluster id:           17792677515884116443
  GPFS UID domain:           img.local
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

Node Daemon node name IP address Admin node name Designation
----------------------------------------------------------------------------------
1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager
   2   gpfs-quorum.img.local  192.168.20.60 gpfs-quorum.img.local  quorum
3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager
   4   tau.img.local          192.168.1.248 tau.img.local
5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager
   8   whale.img.cas.cz       147.231.150.108 whale.img.cas.cz


[root@gpfs-n2 ~]# mmlsmount gpfs01 -L   # not so good

File system gpfs01 is mounted on 7 nodes:
  192.168.20.63   gpfs-n3
  192.168.20.61   gpfs-n1
  192.168.20.62   gpfs-n2
  192.168.1.248   tau
  192.168.20.64   gpfs-n4.img.local
  192.168.20.60   gpfs-quorum.img.local
  147.231.150.108 whale.img.cas.cz

[root@gpfs-n2 ~]# tsctl shownodes up | tr ','  '\n'   # very wrong
whale.img.cas.cz.img.local
tau.img.local
gpfs-quorum.img.local.img.local
gpfs-n1.img.local
gpfs-n2.img.local
gpfs-n3.img.local
gpfs-n4.img.local.img.local

The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly:

[root@gpfs-n4 /]# hostname
gpfs-n4

[root@gpfs-n4 /]# hostname -f
gpfs-n4.img.local

[root@gpfs-n4 /]# cat /etc/resolv.conf
nameserver 192.168.20.30
nameserver 147.231.150.2
search img.local
domain img.local

[root@gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4
192.168.20.64    gpfs-n4.img.local gpfs-n4

[root@gpfs-n4 /]# host gpfs-n4
gpfs-n4.img.local has address 192.168.20.64

[root@gpfs-n4 /]# host 192.168.20.64
64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local.

Can someone help me with this.

Thanks,
Michal

p.s.  gpfs version: 4.2.3-2 (CentOS 7)
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to