Ok. Your packages are right, The problem here is that it's not clear who currently supports heartbeat-* packages. I know for sure that a fixed version of pgsql was submitted into Mercurial by Dejan but it looks like heartbeat-* packages weren't rebuilt and still include 2.1.3 stuff. Unfortunately current support/release situation for the project isn't clear but I know that guys at Novell are trying to improve it.
Attached is a fixed version of pgsql OCF. On Thu, May 1, 2008 at 7:50 AM, Doug Knight <[EMAIL PROTECTED]> wrote: > Serge, > I installed the following RPMs: > > libnet-1.1.2.1-1.1.x86_64.rpm > heartbeat-resources-2.1.3-21.1.x86_64.rpm > heartbeat-common-2.1.3-21.1.x86_64.rpm > heartbeat-2.1.3-21.1.x86_64.rpm > pacemaker-heartbeat-0.6.3-6.1.x86_64.rpm > pacemaker-pygui-1.3-6.1.x86_64.rpm > > On the system I was upgrading, yes, I had a stand-alone (non-HB managed) > 8.3.1 version running, which your suspicions are correct, it caused the > pgsql to get confused. The version that came with 2.1.3 only checks for > the postmaster process, it does not use the environment variables to > check for the specific instance. Reverting back to the version I had > been using before the upgrade worked, as it checks for the postgres root > and associated postmaster.pid file, THEN actually looks for the process > ID found there. > > Doug > > > > On Thu, 2008-05-01 at 06:39 -0600, Serge Dubrouski wrote: > > > Pick up a new pgsql from the Pacemaker repositary. Though 2.1.3 is the > > latest available build for Heartbeat in the old version of packaging > > it's known for having so major issues. So I'd recommend to switch to > > Pacemaker (Heartbet version). Some time ago lars and Andrew announced > > that they'll issue 2.1.4 in the old packaging but so far it didn't > > happen. > > > > For my reference. Do you have several instances of PostgreSQL running > > on the same node? pgsql from 2.1.3 shouldn't have any problems unless > > you do. > > > > On Thu, May 1, 2008 at 6:10 AM, Doug Knight <[EMAIL PROTECTED]> wrote: > > > Serge, > > > I kept the one I had been using from the previous install, which seems > > > to work OK. I thought 2.1.3 was the latest build? Which build do you > > > recommend I grab pgsql from? Using the old pgsql I was able to bring up > > > heartbeat on the secondary. Today I upgrade the primary, so it would be > > > nice for someone to suggest how to prevent the emergency reboot from > > > happening. I plan on "unmanaging" the resources during the upgrade, as > > > opposed to taking them down. > > > > > > Thanks, > > > Doug > > > > > > > > > > > > On Wed, 2008-04-30 at 13:40 -0600, Serge Dubrouski wrote: > > > > > > > Doug - > > > > > > > > Regarding pgsql. The version of that OCF that comes with 2.1.3 isn't > > > > "the best" one. I'd strongly recommend to get a newer one from the > > > > later builds. > > > > > > > > On Wed, Apr 30, 2008 at 1:32 PM, Doug Knight <[EMAIL PROTECTED]> > wrote: > > > > > Hi, > > > > > I am performing a rolling upgrade on a RHEL5 system. Old HA was > 2.0.8, > > > > > upgrading to 2.1.3, Primary is 2.0.8 and up, secondary was the one > being > > > > > upgraded. During the startup I encountered some issues with my OCF > > > > > scripts for our applications, which I have now corrected (mainly > the > > > > > relocation of the ocf-shellfuncs, etc). The upgraded node did come > up > > > > > and connect to the primary server (though it decided to try > restarting > > > > > postgres locally when it wasn't supposed to, more in a later email, > > > > > maybe). There are two things that concern me. First, I saw a > warning as > > > > > follows: > > > > > > > > > > WARN: crm_peer_init: Set these options via openais.conf > > > > > > > > > > I did not install AIS, I stayed with the heartbeat-only stack > > > > > (heartbeat, common, resource, heartbeat-pacemaker, etc). Should I > be > > > > > concerned about this warning, and if so what should I do about it? > > > > > > > > > > Second, once I let the systems settle out and the logs got quiet, I > > > > > checked status on my resources. As noted previously, pgsql had > problems. > > > > > I attempted to clean pgsql (crm_resource -C -r pgsql_5432, which > stated > > > > > I needed to use -H, which I did), and I got an emergency condition > in > > > > > heartbeat and it rebooted my server! So aside from the pgsql > issue, how > > > > > can I prevent heartbeat from doing a reboot? There are other things > > > > > running on this server which a reboot plays havoc with, so I would > like > > > > > to avoid a repeat if possible. > > > > > > > > > > Thanks, > > > > > Doug Knight > > > > > WSI Corp > > > > > p.s. below is the log from the point that pgsql was frozen to the > > > > > reboot. > > > > > > > > > > lrmd[5476]: 2008/04/30_13:54:03 notice: on_msg_perform_op: resource > > > > > pgsql_5432 is frozen, no ops can run. > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_lrm_rsc_op: Operation > monitor > > > > > on pgsql_5432 failed: -1 > > > > > crmd[5479]: 2008/04/30_13:54:03 WARN: do_log: [[FSA]] Input I_FAIL > from > > > > > do_lrm_rsc_op() received in state (S_NOT_DC) > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State > > > > > transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL > cause=C_FSA_INTERNAL > > > > > origin=do_lrm_rsc_op ] > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_recover: Action A_RECOVER > > > > > (0000000001000000) not supported > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_log: [[FSA]] Input > I_TERMINATE > > > > > from do_recover() received in state (S_RECOVERY) > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State > > > > > transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE > > > > > cause=C_FSA_INTERNAL origin=do_recover ] > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_shutdown: All subsystems > > > > > stopped, continuing > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > > > > > get_vortex_rpm_conus_4km_ingestor_HA was active at shutdown. You > may > > > > > ignore this error if it is unmanaged. > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > > > > > get_vortex_rpm_ingestor_HA was active at shutdown. You may ignore > this > > > > > error if it is unmanaged. > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > > > > > pgsql_5432 was active at shutdown. You may ignore this error if > it is > > > > > unmanaged. > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > > > > > get_sat_hd_vsir_ingestor_HA was active at shutdown. You may > ignore this > > > > > error if it is unmanaged. > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource > > > > > get_vortex_etagfs_ingestor_HA was active at shutdown. You may > ignore > > > > > this error if it is unmanaged. > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_lrm_control: Disconnected > from > > > > > the LRM > > > > > ccm[5474]: 2008/04/30_13:54:03 info: client (pid=5479) removed > from ccm > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_ha_control: Disconnected > from > > > > > Heartbeat > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_cib_control: > Disconnecting CIB > > > > > crmd[5479]: 2008/04/30_13:54:03 info: crmd_cib_connection_destroy: > > > > > Connection to the CIB terminated... > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_exit: Performing A_EXIT_0 > - > > > > > gracefully exiting the CRMd > > > > > crmd[5479]: 2008/04/30_13:54:03 ERROR: do_exit: Could not recover > from > > > > > internal error > > > > > crmd[5479]: 2008/04/30_13:54:03 info: free_mem: Dropping > I_TERMINATE: > > > > > [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] > > > > > crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying > entry > > > > > for node 1 > > > > > cib[5475]: 2008/04/30_13:54:03 WARN: send_via_callback_channel: > Client > > > > > 4e142eb9-e202-4a30-98f0-de2091d78976 has disconnected > > > > > crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying > entry > > > > > for node 0 > > > > > cib[5475]: 2008/04/30_13:54:03 WARN: do_local_notify: A-Sync reply > to > > > > > 5479 failed: client left before we could send reply > > > > > crmd[5479]: 2008/04/30_13:54:03 info: do_exit: [crmd] stopped (2) > > > > > heartbeat[5455]: 2008/04/30_13:54:03 WARN: > > > > > Managed /usr/lib64/heartbeat/crmd process 5479 exited with return > code > > > > > 2. > > > > > heartbeat[5455]: 2008/04/30_13:54:03 EMERG: Rebooting system. > > > > > Reason: /usr/lib64/heartbeat/crmd > > > > > > > > > > _______________________________________________ > > > > > Linux-HA mailing list > > > > > [email protected] > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Serge Dubrouski.
pgsql
Description: Binary data
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
