Based on experiments on my test cluster, I can assure you that you can
list and change GPFS configuration parameters with CCR enabled while GPFS
is down.
I understand you are having a problem with your cluster, but you are
incorrectly disparaging the CCR.
In fact you can mmshutdown -a AND kill all GPFS related processes,
including mmsdrserv and mmcrmonitor and then issue commands like:
mmlscluster, mmlsconfig, mmchconfig
Those will work correctly and by-the-way re-start mmsdrserv and
mmcrmonitor...
(Use command like `ps auxw | grep mm` to find the relevenat processes).
But that will not start the main GPFS file manager process mmfsd. GPFS
"proper" remains down...
For the following commands Linux was "up" on all nodes, but GPFS was
shutdown.
[root@n2 gpfs-git]# mmgetstate -a
Node number Node name GPFS state
------------------------------------------
1 n2 down
3 n4 down
4 n5 down
6 n3 down
However if a majority of the quorum nodes can not be obtained, you WILL
see a sequence of messages like this, after a noticeable "timeout":
(For the following test I had three quorum nodes and did a Linux shutdown
on two of them...)
[root@n2 gpfs-git]# mmlsconfig
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158
mmlsconfig: Command failed. Examine previous error messages to determine
cause.
[root@n2 gpfs-git]# mmchconfig worker1Threads=1022
mmchconfig: Unable to obtain the GPFS configuration file lock.
mmchconfig: GPFS was unable to obtain a lock from node n2.frozen.
mmchconfig: Command failed. Examine previous error messages to determine
cause.
[root@n2 gpfs-git]# mmgetstate -a
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158
mmgetstate: Command failed. Examine previous error messages to determine
cause.
HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it
should check!
Then re-starting Linux... So I have two of three quorum nodes active, but
GPFS still down...
## From n2, login to node n3 that I just rebooted...
[root@n2 gpfs-git]# ssh n3
Last login: Thu Jul 28 09:50:53 2016 from n2.frozen
## See if any mm processes are running? ... NOPE!
[root@n3 ~]# ps auxw | grep mm
ps auxw | grep mm
root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep
--color=auto mm
## Check the state... notice n4 is powered off...
[root@n3 ~]# mmgetstate -a
mmgetstate -a
Node number Node name GPFS state
------------------------------------------
1 n2 down
3 n4 unknown
4 n5 down
6 n3 down
## Examine the cluster configuration
[root@n3 ~]# mmlscluster
mmlscluster
GPFS cluster information
========================
GPFS cluster name: madagascar.frozen
GPFS cluster id: 7399668614468035547
GPFS UID domain: madagascar.frozen
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
GPFS cluster configuration servers:
-----------------------------------
Primary server: n2.frozen (not in use)
Secondary server: n4.frozen (not in use)
Node Daemon node name IP address Admin node name Designation
-------------------------------------------------------------------
1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon
3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon
4 n5.frozen 172.20.0.24 n5.frozen perfmon
6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon
## notice that mmccrmonitor and mmsdrserv are running but not mmfsd
[root@n3 ~]# ps auxw | grep mm
ps auxw | grep mm
root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00
/usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes
root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep
--color=auto mm
## Now I can mmchconfig ... while GPFS remains down.
[root@n3 ~]# mmchconfig worker1Threads=1022
mmchconfig worker1Threads=1022
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
[root@n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs
propagation started
Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation
completed; mmdsh rc=0
[root@n3 ~]# mmgetstate -a
mmgetstate -a
Node number Node name GPFS state
------------------------------------------
1 n2 down
3 n4 unknown
4 n5 down
6 n3 down
## Quorum node n4 remains unreachable... But n2 and n3 are running Linux.
[root@n3 ~]# ping -c 1 n4
ping -c 1 n4
PING n4.frozen (172.20.0.23) 56(84) bytes of data.
>From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable
--- n4.frozen ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
[root@n3 ~]# exit
exit
logout
Connection to n3 closed.
[root@n2 gpfs-git]# ps auwx | grep mm
root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep
--color=auto mm
root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00
/usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1
root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python
/usr/lpp/mmfs/bin/mmsysmon.py
[root@n2 gpfs-git]#
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss