I have a two node cluster on RHEL 6.3. It is serving up three NFS mounts and a Postgres 9.0 database. The database uses a GFS2 disk and the NFS mount points are ext4. I can't seem to fail the services between nodes with out a disable/enable. On top of that issue, please just look at my config and let me know where it can be improved in general. Here's a log showing me trying to relocate postgres from one node to the other:
*Aug 26 10:50:35 omadvnfs01c rgmanager[9149]: Stopping service service:postgresql90* *Aug 26 10:50:35 omadvnfs01c rgmanager[19756]: [ip] Removing IPv4 address 10.198.1.112/24 from bond0* *Aug 26 10:50:35 omadvnfs01c avahi-daemon[6596]: Withdrawing address record for 10.198.1.112 on bond0.* *Aug 26 10:50:35 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:50:45 omadvnfs01c rsyslogd-2177: imuxsock lost 270 messages from pid 5431 due to rate-limiting* *Aug 26 10:50:45 omadvnfs01c rgmanager[20118]: [script] Executing /etc/init.d/postgresql-9.0 stop* *Aug 26 10:50:45 omadvnfs01c postgres[18312]: [2-1] LOG: received fast shutdown request* *Aug 26 10:50:45 omadvnfs01c postgres[18312]: [3-1] LOG: aborting any active transactions* *Aug 26 10:50:45 omadvnfs01c postgres[19284]: [10-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19207]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19102]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19100]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19099]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19141]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19142]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19072]: [2-1] LOG: autovacuum launcher shutting down* *Aug 26 10:50:45 omadvnfs01c postgres[19138]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19137]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19139]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19134]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19110]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19136]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19098]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19101]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19140]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19135]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:45 omadvnfs01c postgres[19133]: [2-1] FATAL: terminating connection due to administrator command* *Aug 26 10:50:46 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:50:55 omadvnfs01c nrpe[20652]: Error: Could not complete SSL handshake. 5* *Aug 26 10:50:55 omadvnfs01c rsyslogd-2177: imuxsock lost 352 messages from pid 5431 due to rate-limiting* *Aug 26 10:50:57 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:51:05 omadvnfs01c rsyslogd-2177: imuxsock lost 32 messages from pid 5431 due to rate-limiting* *Aug 26 10:51:15 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:51:24 omadvnfs01c rsyslogd-2177: imuxsock lost 212 messages from pid 5431 due to rate-limiting* *Aug 26 10:51:27 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:51:45 omadvnfs01c rsyslogd-2177: imuxsock lost 38 messages from pid 5431 due to rate-limiting* *Aug 26 10:51:46 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:51:46 omadvnfs01c rgmanager[22393]: [script] script:postgresql90-init: stop of /etc/init.d/postgresql-9.0 failed (returned 1)* *Aug 26 10:51:46 omadvnfs01c rgmanager[9149]: stop on script "postgresql90-init" returned 1 (generic error)* *Aug 26 10:51:46 omadvnfs01c rgmanager[22492]: [fs] unmounting /data03* *Aug 26 10:51:46 omadvnfs01c rgmanager[22533]: [fs] Sending SIGTERM to processes on /data03* *Aug 26 10:51:52 omadvnfs01c rsyslogd-2177: imuxsock lost 248 messages from pid 5431 due to rate-limiting* *Aug 26 10:51:52 omadvnfs01c rgmanager[22636]: [fs] unmounting /data03* *Aug 26 10:51:52 omadvnfs01c rgmanager[22677]: [fs] Sending SIGKILL to processes on /data03* *Aug 26 10:51:55 omadvnfs01c rsyslogd-2177: imuxsock begins to drop messages from pid 5431 due to rate-limiting* *Aug 26 10:51:57 omadvnfs01c rgmanager[23435]: [fs] unmounting /data03* *Aug 26 10:51:58 omadvnfs01c rsyslogd-2177: imuxsock lost 344 messages from pid 5431 due to rate-limiting* *Aug 26 10:51:58 omadvnfs01c rgmanager[9149]: #12: RG service:postgresql90 failed to stop; intervention required* *Aug 26 10:51:58 omadvnfs01c rgmanager[9149]: Service service:postgresql90 is failed* Here is my cluster.conf: *<?xml version="1.0"?>* *<cluster config_version="166" name="omadvnfs01">* * <cman expected_votes="1" two_node="1"/>* * <clusternodes>* * <clusternode name="omadvnfs01c.sec.jel.lc" nodeid="1">* * <fence>* * <method name="drac">* * <device name="omadvnfs01c-drac"/>* * </method>* * </fence>* * </clusternode>* * <clusternode name="omadvnfs01b.sec.jel.lc" nodeid="2">* * <fence>* * <method name="drac">* * <device name="omadvnfs01b-drac"/>* * </method>* * </fence>* * </clusternode>* * </clusternodes>* * <fencedevices>* * <fencedevice agent="fence_drac5" ipaddr="10.98.1.213" login="root" module_name="omadvnfs01c" name="omadvnfs01c-drac" passwd="narf" secure="on"/>* * <fencedevice agent="fence_drac5" ipaddr="10.98.1.212" login="root" module_name="omadvnfs01b" name="omadvnfs01b-drac" passwd="narf" secure="on"/>* * </fencedevices>* * <rm>* * <resources>* * <nfsexport name="data01a"/>* * <nfsexport name="data01b"/>* * <nfsexport name="data01c"/>* * <nfsclient allow_recover="on" name="omadvdss01a" options="rw,no_root_squash,async" target="omadvdss01a"/>* * <nfsclient allow_recover="on" name="omadvdss01b" options="rw,no_root_squash,async" target="omadvdss01b"/>* * <nfsclient allow_recover="on" name="omadvdss01c" options="rw,no_root_squash,async" target="omadvdss01c"/>* * <script file="/etc/init.d/postgresql-9.0" name="postgresql90-init"/>* * <script file="/etc/init.d/postgresql-9.1" name="postgresql91-init"/>* * <ip address="10.198.1.112" monitor_link="on" sleeptime="10"/>* * <ip address="10.198.1.113" monitor_link="on" sleeptime="10"/>* * <ip address="10.198.1.114" monitor_link="on" sleeptime="10"/>* * <ip address="10.198.1.115" monitor_link="on" sleeptime="10"/>* * <script file="/etc/init.d/postgresql-8.4" name="postgresql84-init"/>* * <fs device="/dev/vg_data01a/lv_data01a" force_unmount="1" fsid="18521" self_fence="1" fstype="ext4" mountpoint="/data01a" name="omadvnfs01-data01a" nfslock="1" options="noatime,nodiratime,data=writeback,commit=30"/>* * <fs device="/dev/vg_data01b/lv_data01b" force_unmount="1" fsid="6623" self_fence="1" fstype="ext4" mountpoint="/data01b" name="omadvnfs01-data01b" nfslock="1" options="noatime,nodiratime,data=writeback,commit=30"/>* * <fs device="/dev/vg_data01c/lv_data01c" force_unmount="1" fsid="91523" self_fence="1" fstype="ext4" mountpoint="/data01c" name="omadvnfs01-data01c" nfslock="1" options="noatime,nodiratime,data=writeback,commit=30"/>* * <fs device="/dev/vg_data03/lv_data03" force_unmount="1" force_fsck="1" self_fence="1" fsid="15631" fstype="gfs2" mountpoint="/data03" name="omadvnfs01-data03" options=""/>* * </resources>* * <failoverdomains>* * <failoverdomain name="fd_omadvnfs01c" nofailback="1" ordered="1" restricted="0">* * <failoverdomainnode name=" omadvnfs01c.sec.jel.lc" priority="1"/>* * <failoverdomainnode name=" omadvnfs01b.sec.jel.lc" priority="2"/>* * </failoverdomain>* * <failoverdomain name="fd_omadvnfs01b" nofailback="1" ordered="1" restricted="0">* * <failoverdomainnode name=" omadvnfs01b.sec.jel.lc" priority="1"/>* * <failoverdomainnode name=" omadvnfs01c.sec.jel.lc" priority="2"/>* * </failoverdomain>* * </failoverdomains>* * <service domain="fd_omadvnfs01b" name="omadvnfs01-nfs-data01b" nfslock="1" recovery="relocate">* * <fs ref="omadvnfs01-data01b">* * <nfsexport ref="data01b">* * <ip ref="10.198.1.114"/>* * <nfsclient ref="omadvdss01a"/>* * <nfsclient ref="omadvdss01b"/>* * <nfsclient ref="omadvdss01c"/> </nfsexport> </fs> </service> <service domain="fd_omadvnfs01c" name="omadvnfs01-nfs-data01a" nfslock="1" recovery="relocate"> <fs ref="omadvnfs01-data01a"> <nfsexport ref="data01a"> <ip ref="10.198.1.113"/> <nfsclient ref="omadvdss01a"/> <nfsclient ref="omadvdss01b"/> <nfsclient ref="omadvdss01c"/> </nfsexport> </fs> </service> <service domain="fd_omadvnfs01c" name="omadvnfs01-nfs-data01c" nfslock="1" recovery="relocate"> <fs ref="omadvnfs01-data01c"> <nfsexport ref="data01c"> <ip ref="10.198.1.115"/> <nfsclient ref="omadvdss01a"/> <nfsclient ref="omadvdss01b"/> <nfsclient ref="omadvdss01c"/> </nfsexport> </fs> </service> <service domain="fd_omadvnfs01b" name="postgresql90" recovery="relocate"> <ip ref="10.198.1.112"/> <fs ref="omadvnfs01-data03"> <script ref="postgresql90-init"/> </fs> </service> </rm> <logging debug="on" logfile="/var/log/cluster.log" logfile_priority="debug"/> </cluster> * There's nothing of interest in my cluster.log file during the time when I attempted to relocate.
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster