Re: [Linux-HA] System rebooted during rolling upgrade

Serge Dubrouski Thu, 01 May 2008 06:36:47 -0700

Pick up a new pgsql from the Pacemaker repositary. Though 2.1.3 is the
latest available build for Heartbeat in the old version of packaging
it's known for having so major issues. So I'd recommend to switch to
Pacemaker (Heartbet version). Some time ago lars and Andrew announced
that they'll issue 2.1.4 in the old packaging but so far it didn't
happen.


For my reference. Do you have several instances of PostgreSQL running
on the same node? pgsql from 2.1.3 shouldn't have any problems unless
you do.

On Thu, May 1, 2008 at 6:10 AM, Doug Knight <[EMAIL PROTECTED]> wrote:
> Serge,
>  I kept the one I had been using from the previous install, which seems
>  to work OK. I thought 2.1.3 was the latest build? Which build do you
>  recommend I grab pgsql from? Using the old pgsql I was able to bring up
>  heartbeat on the secondary. Today I upgrade the primary, so it would be
>  nice for someone to suggest how to prevent the emergency reboot from
>  happening. I plan on "unmanaging" the resources during the upgrade, as
>  opposed to taking them down.
>
>  Thanks,
>  Doug
>
>
>
>  On Wed, 2008-04-30 at 13:40 -0600, Serge Dubrouski wrote:
>
>  > Doug -
>  >
>  > Regarding pgsql. The version of that OCF that comes with 2.1.3 isn't
>  > "the best" one. I'd strongly recommend to get a  newer one from the
>  > later builds.
>  >
>  > On Wed, Apr 30, 2008 at 1:32 PM, Doug Knight <[EMAIL PROTECTED]> wrote:
>  > > Hi,
>  > >  I am performing a rolling upgrade on a RHEL5 system. Old HA was 2.0.8,
>  > >  upgrading to 2.1.3, Primary is 2.0.8 and up, secondary was the one being
>  > >  upgraded. During the startup I encountered some issues with my OCF
>  > >  scripts for our applications, which I have now corrected (mainly the
>  > >  relocation of the ocf-shellfuncs, etc). The upgraded node did come up
>  > >  and connect to the primary server (though it decided to try restarting
>  > >  postgres locally when it wasn't supposed to, more in a later email,
>  > >  maybe). There are two things that concern me. First, I saw a warning as
>  > >  follows:
>  > >
>  > >  WARN: crm_peer_init: Set these options via openais.conf
>  > >
>  > >  I did not install AIS, I stayed with the heartbeat-only stack
>  > >  (heartbeat, common, resource, heartbeat-pacemaker, etc). Should I be
>  > >  concerned about this warning, and if so what should I do about it?
>  > >
>  > >  Second, once I let the systems settle out and the logs got quiet, I
>  > >  checked status on my resources. As noted previously, pgsql had problems.
>  > >  I attempted to clean pgsql (crm_resource -C -r pgsql_5432, which stated
>  > >  I needed to use -H, which I did), and I got an emergency condition in
>  > >  heartbeat and it rebooted my server! So aside from the pgsql issue, how
>  > >  can I prevent heartbeat from doing a reboot? There are other things
>  > >  running on this server which a reboot plays havoc with, so I would like
>  > >  to avoid a repeat if possible.
>  > >
>  > >  Thanks,
>  > >  Doug Knight
>  > >  WSI Corp
>  > >  p.s. below is the log from the point that pgsql was frozen to the
>  > >  reboot.
>  > >
>  > >  lrmd[5476]: 2008/04/30_13:54:03 notice: on_msg_perform_op: resource
>  > >  pgsql_5432 is frozen, no ops can run.
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_lrm_rsc_op: Operation monitor
>  > >  on pgsql_5432 failed: -1
>  > >  crmd[5479]: 2008/04/30_13:54:03 WARN: do_log: [[FSA]] Input I_FAIL from
>  > >  do_lrm_rsc_op() received in state (S_NOT_DC)
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
>  > >  transition S_NOT_DC -> S_RECOVERY [ input=I_FAIL cause=C_FSA_INTERNAL
>  > >  origin=do_lrm_rsc_op ]
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_recover: Action A_RECOVER
>  > >  (0000000001000000) not supported
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_log: [[FSA]] Input I_TERMINATE
>  > >  from do_recover() received in state (S_RECOVERY)
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_state_transition: State
>  > >  transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
>  > >  cause=C_FSA_INTERNAL origin=do_recover ]
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_shutdown: All subsystems
>  > >  stopped, continuing
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
>  > >  get_vortex_rpm_conus_4km_ingestor_HA was active at shutdown.  You may
>  > >  ignore this error if it is unmanaged.
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
>  > >  get_vortex_rpm_ingestor_HA was active at shutdown.  You may ignore this
>  > >  error if it is unmanaged.
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
>  > >  pgsql_5432 was active at shutdown.  You may ignore this error if it is
>  > >  unmanaged.
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
>  > >  get_sat_hd_vsir_ingestor_HA was active at shutdown.  You may ignore this
>  > >  error if it is unmanaged.
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: verify_stopped: Resource
>  > >  get_vortex_etagfs_ingestor_HA was active at shutdown.  You may ignore
>  > >  this error if it is unmanaged.
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_lrm_control: Disconnected from
>  > >  the LRM
>  > >  ccm[5474]: 2008/04/30_13:54:03 info: client (pid=5479) removed from ccm
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_ha_control: Disconnected from
>  > >  Heartbeat
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_cib_control: Disconnecting CIB
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: crmd_cib_connection_destroy:
>  > >  Connection to the CIB terminated...
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: Performing A_EXIT_0 -
>  > >  gracefully exiting the CRMd
>  > >  crmd[5479]: 2008/04/30_13:54:03 ERROR: do_exit: Could not recover from
>  > >  internal error
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: free_mem: Dropping I_TERMINATE:
>  > >  [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry
>  > >  for node 1
>  > >  cib[5475]: 2008/04/30_13:54:03 WARN: send_via_callback_channel: Client
>  > >  4e142eb9-e202-4a30-98f0-de2091d78976 has disconnected
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: destroy_crm_node: Destroying entry
>  > >  for node 0
>  > >  cib[5475]: 2008/04/30_13:54:03 WARN: do_local_notify: A-Sync reply to
>  > >  5479 failed: client left before we could send reply
>  > >  crmd[5479]: 2008/04/30_13:54:03 info: do_exit: [crmd] stopped (2)
>  > >  heartbeat[5455]: 2008/04/30_13:54:03 WARN:
>  > >  Managed /usr/lib64/heartbeat/crmd process 5479 exited with return code
>  > >  2.
>  > >  heartbeat[5455]: 2008/04/30_13:54:03 EMERG: Rebooting system.
>  > >  Reason: /usr/lib64/heartbeat/crmd
>  > >
>  > >  _______________________________________________
>  > >  Linux-HA mailing list
>  > >  [email protected]
>  > >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > >  See also: http://linux-ha.org/ReportingProblems
>  > >
>  >
>  >
>  >
>  _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>



-- 
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] System rebooted during rolling upgrade

Reply via email to