Re: [Linux-HA] System rebooted during rolling upgrade

Doug Knight Thu, 01 May 2008 05:11:20 -0700

Serge,
I kept the one I had been using from the previous install, which seems
to work OK. I thought 2.1.3 was the latest build? Which build do you
recommend I grab pgsql from? Using the old pgsql I was able to bring up
heartbeat on the secondary. Today I upgrade the primary, so it would be
nice for someone to suggest how to prevent the emergency reboot from
happening. I plan on "unmanaging" the resources during the upgrade, as
opposed to taking them down.


Thanks,
Doug

On Wed, 2008-04-30 at 13:40 -0600, Serge Dubrouski wrote:

> Doug -
> 
> Regarding pgsql. The version of that OCF that comes with 2.1.3 isn't
> "the best" one. I'd strongly recommend to get a  newer one from the
> later builds.
> 
> On Wed, Apr 30, 2008 at 1:32 PM, Doug Knight <[EMAIL PROTECTED]> wrote:
> > Hi,
> >  I am performing a rolling upgrade on a RHEL5 system. Old HA was 2.0.8,
> >  upgrading to 2.1.3, Primary is 2.0.8 and up, secondary was the one being
> >  upgraded. During the startup I encountered some issues with my OCF
> >  scripts for our applications, which I have now corrected (mainly the
> >  relocation of the ocf-shellfuncs, etc). The upgraded node did come up
> >  and connect to the primary server (though it decided to try restarting
> >  postgres locally when it wasn't supposed to, more in a later email,
> >  maybe). There are two things that concern me. First, I saw a warning as
> >  follows:
> >
> >  WARN: crm_peer_init: Set these options via openais.conf
> >
> >  I did not install AIS, I stayed with the heartbeat-only stack
> >  (heartbeat, common, resource, heartbeat-pacemaker, etc). Should I be
> >  concerned about this warning, and if so what should I do about it?
> >
> >  Second, once I let the systems settle out and the logs got quiet, I
> >  checked status on my resources. As noted previously, pgsql had problems.
> >  I attempted to clean pgsql (crm_resource -C -r pgsql_5432, which stated
> >  I needed to use -H, which I did), and I got an emergency condition in
> >  heartbeat and it rebooted my server! So aside from the pgsql issue, how
> >  can I prevent heartbeat from doing a reboot? There are other things
> >  running on this server which a reboot plays havoc with, so I would like
> >  to avoid a repeat if possible.
> >
> >  Thanks,
> >  Doug Knight
> >  WSI Corp
> >  p.s. below is the log from the point that pgsql was frozen to the
> >  reboot.
> >
> >  lrmd[5476]: 2008/04/30_13:54:03 notice: on_msg_perform_op: resource
> >  pgsql_5432 is frozen, no ops can run.
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_lrm_rsc_op: Operation monitor
> >  on pgsql_5432 failed: -1
> >  crmd[5479]: 2008/04/30_13:54:03 WARN: do_log: [[FSA]] Input I_FAIL from
> >  do_lrm_rsc_op() received in state (S_NOT_DC)
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
> >  transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL
> >  origin=do_lrm_rsc_op ]
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_recover: Action A_RECOVER
> >  (0000000001000000) not supported
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_log: [[FSA]] Input I_TERMINATE
> >  from do_recover() received in state (S_RECOVERY)
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
> >  transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
> >  cause=C_FSA_INTERNAL origin=do_recover ]
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_shutdown: All subsystems
> >  stopped, continuing
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  get_vortex_rpm_conus_4km_ingestor_HA was active at shutdown.  You may
> >  ignore this error if it is unmanaged.
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  get_vortex_rpm_ingestor_HA was active at shutdown.  You may ignore this
> >  error if it is unmanaged.
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  pgsql_5432 was active at shutdown.  You may ignore this error if it is
> >  unmanaged.
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  get_sat_hd_vsir_ingestor_HA was active at shutdown.  You may ignore this
> >  error if it is unmanaged.
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
> >  get_vortex_etagfs_ingestor_HA was active at shutdown.  You may ignore
> >  this error if it is unmanaged.
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_lrm_control: Disconnected from
> >  the LRM
> >  ccm[5474]: 2008/04/30_13:54:03 info: client (pid=5479) removed from ccm
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_ha_control: Disconnected from
> >  Heartbeat
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_cib_control: Disconnecting CIB
> >  crmd[5479]: 2008/04/30_13:54:03 info: crmd_cib_connection_destroy:
> >  Connection to the CIB terminated...
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: Performing A_EXIT_0 -
> >  gracefully exiting the CRMd
> >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_exit: Could not recover from
> >  internal error
> >  crmd[5479]: 2008/04/30_13:54:03 info: free_mem: Dropping I_TERMINATE:
> >  [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
> >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry
> >  for node 1
> >  cib[5475]: 2008/04/30_13:54:03 WARN: send_via_callback_channel: Client
> >  4e142eb9-e202-4a30-98f0-de2091d78976 has disconnected
> >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry
> >  for node 0
> >  cib[5475]: 2008/04/30_13:54:03 WARN: do_local_notify: A-Sync reply to
> >  5479 failed: client left before we could send reply
> >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: [crmd] stopped (2)
> >  heartbeat[5455]: 2008/04/30_13:54:03 WARN:
> >  Managed /usr/lib64/heartbeat/crmd process 5479 exited with return code
> >  2.
> >  heartbeat[5455]: 2008/04/30_13:54:03 EMERG: Rebooting system.
> >  Reason: /usr/lib64/heartbeat/crmd
> >
> >  _______________________________________________
> >  Linux-HA mailing list
> >  [email protected]
> >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >  See also: http://linux-ha.org/ReportingProblems
> >
> 
> 
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] System rebooted during rolling upgrade

Reply via email to