Re: [Linux-HA] System rebooted during rolling upgrade

Doug Knight Thu, 01 May 2008 06:51:07 -0700

Serge,
I installed the following RPMs:

libnet-1.1.2.1-1.1.x86_64.rpm
heartbeat-resources-2.1.3-21.1.x86_64.rpm
heartbeat-common-2.1.3-21.1.x86_64.rpm
heartbeat-2.1.3-21.1.x86_64.rpm
pacemaker-heartbeat-0.6.3-6.1.x86_64.rpm
pacemaker-pygui-1.3-6.1.x86_64.rpm


On the system I was upgrading, yes, I had a stand-alone (non-HB managed)
8.3.1 version running, which your suspicions are correct, it caused the
pgsql to get confused. The version that came with 2.1.3 only checks for
the postmaster process, it does not use the environment variables to
check for the specific instance. Reverting back to the version I had
been using before the upgrade worked, as it checks for the postgres root
and associated postmaster.pid file, THEN actually looks for the process
ID found there.

Doug

On Thu, 2008-05-01 at 06:39 -0600, Serge Dubrouski wrote:

> Pick up a new pgsql from the Pacemaker repositary. Though 2.1.3 is the
> latest available build for Heartbeat in the old version of packaging
> it's known for having so major issues. So I'd recommend to switch to
> Pacemaker (Heartbet version). Some time ago lars and Andrew announced
> that they'll issue 2.1.4 in the old packaging but so far it didn't
> happen.
> 
> For my reference. Do you have several instances of PostgreSQL running
> on the same node? pgsql from 2.1.3 shouldn't have any problems unless
> you do.
> 
> On Thu, May 1, 2008 at 6:10 AM, Doug Knight <[EMAIL PROTECTED]> wrote:
> > Serge,
> >  I kept the one I had been using from the previous install, which seems
> >  to work OK. I thought 2.1.3 was the latest build? Which build do you
> >  recommend I grab pgsql from? Using the old pgsql I was able to bring up
> >  heartbeat on the secondary. Today I upgrade the primary, so it would be
> >  nice for someone to suggest how to prevent the emergency reboot from
> >  happening. I plan on "unmanaging" the resources during the upgrade, as
> >  opposed to taking them down.
> >
> >  Thanks,
> >  Doug
> >
> >
> >
> >  On Wed, 2008-04-30 at 13:40 -0600, Serge Dubrouski wrote:
> >
> >  > Doug -
> >  >
> >  > Regarding pgsql. The version of that OCF that comes with 2.1.3 isn't
> >  > "the best" one. I'd strongly recommend to get a  newer one from the
> >  > later builds.
> >  >
> >  > On Wed, Apr 30, 2008 at 1:32 PM, Doug Knight <[EMAIL PROTECTED]> wrote:
> >  > > Hi,
> >  > >  I am performing a rolling upgrade on a RHEL5 system. Old HA was 2.0.8,
> >  > >  upgrading to 2.1.3, Primary is 2.0.8 and up, secondary was the one 
> > being
> >  > >  upgraded. During the startup I encountered some issues with my OCF
> >  > >  scripts for our applications, which I have now corrected (mainly the
> >  > >  relocation of the ocf-shellfuncs, etc). The upgraded node did come up
> >  > >  and connect to the primary server (though it decided to try restarting
> >  > >  postgres locally when it wasn't supposed to, more in a later email,
> >  > >  maybe). There are two things that concern me. First, I saw a warning 
> > as
> >  > >  follows:
> >  > >
> >  > >  WARN: crm_peer_init: Set these options via openais.conf
> >  > >
> >  > >  I did not install AIS, I stayed with the heartbeat-only stack
> >  > >  (heartbeat, common, resource, heartbeat-pacemaker, etc). Should I be
> >  > >  concerned about this warning, and if so what should I do about it?
> >  > >
> >  > >  Second, once I let the systems settle out and the logs got quiet, I
> >  > >  checked status on my resources. As noted previously, pgsql had 
> > problems.
> >  > >  I attempted to clean pgsql (crm_resource -C -r pgsql_5432, which 
> > stated
> >  > >  I needed to use -H, which I did), and I got an emergency condition in
> >  > >  heartbeat and it rebooted my server! So aside from the pgsql issue, 
> > how
> >  > >  can I prevent heartbeat from doing a reboot? There are other things
> >  > >  running on this server which a reboot plays havoc with, so I would 
> > like
> >  > >  to avoid a repeat if possible.
> >  > >
> >  > >  Thanks,
> >  > >  Doug Knight
> >  > >  WSI Corp
> >  > >  p.s. below is the log from the point that pgsql was frozen to the
> >  > >  reboot.
> >  > >
> >  > >  lrmd[5476]: 2008/04/30_13:54:03 notice: on_msg_perform_op: resource
> >  > >  pgsql_5432 is frozen, no ops can run.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_lrm_rsc_op: Operation 
> > monitor
> >  > >  on pgsql_5432 failed: -1
> >  > >  crmd[5479]: 2008/04/30_13:54:03 WARN: do_log: [[FSA]] Input I_FAIL 
> > from
> >  > >  do_lrm_rsc_op() received in state (S_NOT_DC)
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
> >  > >  transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL
> >  > >  origin=do_lrm_rsc_op ]
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_recover: Action A_RECOVER
> >  > >  (0000000001000000) not supported
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_log: [[FSA]] Input 
> > I_TERMINATE
> >  > >  from do_recover() received in state (S_RECOVERY)
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
> >  > >  transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
> >  > >  cause=C_FSA_INTERNAL origin=do_recover ]
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_shutdown: All subsystems
> >  > >  stopped, continuing
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  > >  get_vortex_rpm_conus_4km_ingestor_HA was active at shutdown.  You may
> >  > >  ignore this error if it is unmanaged.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  > >  get_vortex_rpm_ingestor_HA was active at shutdown.  You may ignore 
> > this
> >  > >  error if it is unmanaged.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  > >  pgsql_5432 was active at shutdown.  You may ignore this error if it is
> >  > >  unmanaged.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  > >  get_sat_hd_vsir_ingestor_HA was active at shutdown.  You may ignore 
> > this
> >  > >  error if it is unmanaged.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  > >  get_vortex_etagfs_ingestor_HA was active at shutdown.  You may ignore
> >  > >  this error if it is unmanaged.
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_lrm_control: Disconnected 
> > from
> >  > >  the LRM
> >  > >  ccm[5474]: 2008/04/30_13:54:03 info: client (pid=5479) removed from 
> > ccm
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_ha_control: Disconnected from
> >  > >  Heartbeat
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_cib_control: Disconnecting 
> > CIB
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: crmd_cib_connection_destroy:
> >  > >  Connection to the CIB terminated...
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: Performing A_EXIT_0 -
> >  > >  gracefully exiting the CRMd
> >  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_exit: Could not recover from
> >  > >  internal error
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: free_mem: Dropping I_TERMINATE:
> >  > >  [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying 
> > entry
> >  > >  for node 1
> >  > >  cib[5475]: 2008/04/30_13:54:03 WARN: send_via_callback_channel: Client
> >  > >  4e142eb9-e202-4a30-98f0-de2091d78976 has disconnected
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying 
> > entry
> >  > >  for node 0
> >  > >  cib[5475]: 2008/04/30_13:54:03 WARN: do_local_notify: A-Sync reply to
> >  > >  5479 failed: client left before we could send reply
> >  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: [crmd] stopped (2)
> >  > >  heartbeat[5455]: 2008/04/30_13:54:03 WARN:
> >  > >  Managed /usr/lib64/heartbeat/crmd process 5479 exited with return code
> >  > >  2.
> >  > >  heartbeat[5455]: 2008/04/30_13:54:03 EMERG: Rebooting system.
> >  > >  Reason: /usr/lib64/heartbeat/crmd
> >  > >
> >  > >  _______________________________________________
> >  > >  Linux-HA mailing list
> >  > >  [email protected]
> >  > >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >  > >  See also: http://linux-ha.org/ReportingProblems
> >  > >
> >  >
> >  >
> >  >
> >  _______________________________________________
> >  Linux-HA mailing list
> >  [email protected]
> >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >  See also: http://linux-ha.org/ReportingProblems
> >
> 
> 
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] System rebooted during rolling upgrade

Reply via email to