Doug - Regarding pgsql. The version of that OCF that comes with 2.1.3 isn't "the best" one. I'd strongly recommend to get a newer one from the later builds.
On Wed, Apr 30, 2008 at 1:32 PM, Doug Knight <[EMAIL PROTECTED]> wrote: > Hi, > I am performing a rolling upgrade on a RHEL5 system. Old HA was 2.0.8, > upgrading to 2.1.3, Primary is 2.0.8 and up, secondary was the one being > upgraded. During the startup I encountered some issues with my OCF > scripts for our applications, which I have now corrected (mainly the > relocation of the ocf-shellfuncs, etc). The upgraded node did come up > and connect to the primary server (though it decided to try restarting > postgres locally when it wasn't supposed to, more in a later email, > maybe). There are two things that concern me. First, I saw a warning as > follows: > > WARN: crm_peer_init: Set these options via openais.conf > > I did not install AIS, I stayed with the heartbeat-only stack > (heartbeat, common, resource, heartbeat-pacemaker, etc). Should I be > concerned about this warning, and if so what should I do about it? > > Second, once I let the systems settle out and the logs got quiet, I > checked status on my resources. As noted previously, pgsql had problems. > I attempted to clean pgsql (crm_resource -C -r pgsql_5432, which stated > I needed to use -H, which I did), and I got an emergency condition in > heartbeat and it rebooted my server! So aside from the pgsql issue, how > can I prevent heartbeat from doing a reboot? There are other things > running on this server which a reboot plays havoc with, so I would like > to avoid a repeat if possible. > > Thanks, > Doug Knight > WSI Corp > p.s. below is the log from the point that pgsql was frozen to the > reboot. > > lrmd[5476]: 2008/04/30_13:54:03 notice: on_msg_perform_op: resource > pgsql_5432 is frozen, no ops can run. > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_lrm_rsc_op: Operation monitor > on pgsql_5432 failed: -1 > crmd[5479]: 2008/04/30_13:54:03 WARN: do_log: [[FSA]] Input I_FAIL from > do_lrm_rsc_op() received in state (S_NOT_DC) > crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State > transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL > origin=do_lrm_rsc_op ] > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_recover: Action A_RECOVER > (0000000001000000) not supported > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_log: [[FSA]] Input I_TERMINATE > from do_recover() received in state (S_RECOVERY) > crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State > transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > cause=C_FSA_INTERNAL origin=do_recover ] > crmd[5479]: 2008/04/30_13:54:03 info: do_shutdown: All subsystems > stopped, continuing > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > get_vortex_rpm_conus_4km_ingestor_HA was active at shutdown. You may > ignore this error if it is unmanaged. > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > get_vortex_rpm_ingestor_HA was active at shutdown. You may ignore this > error if it is unmanaged. > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > pgsql_5432 was active at shutdown. You may ignore this error if it is > unmanaged. > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > get_sat_hd_vsir_ingestor_HA was active at shutdown. You may ignore this > error if it is unmanaged. > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > get_vortex_etagfs_ingestor_HA was active at shutdown. You may ignore > this error if it is unmanaged. > crmd[5479]: 2008/04/30_13:54:03 info: do_lrm_control: Disconnected from > the LRM > ccm[5474]: 2008/04/30_13:54:03 info: client (pid=5479) removed from ccm > crmd[5479]: 2008/04/30_13:54:03 info: do_ha_control: Disconnected from > Heartbeat > crmd[5479]: 2008/04/30_13:54:03 info: do_cib_control: Disconnecting CIB > crmd[5479]: 2008/04/30_13:54:03 info: crmd_cib_connection_destroy: > Connection to the CIB terminated... > crmd[5479]: 2008/04/30_13:54:03 info: do_exit: Performing A_EXIT_0 - > gracefully exiting the CRMd > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_exit: Could not recover from > internal error > crmd[5479]: 2008/04/30_13:54:03 info: free_mem: Dropping I_TERMINATE: > [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] > crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry > for node 1 > cib[5475]: 2008/04/30_13:54:03 WARN: send_via_callback_channel: Client > 4e142eb9-e202-4a30-98f0-de2091d78976 has disconnected > crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry > for node 0 > cib[5475]: 2008/04/30_13:54:03 WARN: do_local_notify: A-Sync reply to > 5479 failed: client left before we could send reply > crmd[5479]: 2008/04/30_13:54:03 info: do_exit: [crmd] stopped (2) > heartbeat[5455]: 2008/04/30_13:54:03 WARN: > Managed /usr/lib64/heartbeat/crmd process 5479 exited with return code > 2. > heartbeat[5455]: 2008/04/30_13:54:03 EMERG: Rebooting system. > Reason: /usr/lib64/heartbeat/crmd > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Serge Dubrouski. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
