Hi, I'm currently running Pacemaker 0.6.5 and I'm preparing an upgrade toward 1.0-tip (currently 0de73ec89e02) with packages I compiled myself on debian lenny amd64.
To check everything works before applying the upgrade on my production environment, I created two VMs and (partly) replicated my cluster on those two machines. To upgrade I shutdowned heartbeat, on both member, following one of the strategy outlined in the "Configuration Explained" document. I then performed the upgrade, and restarted heartbeat on one of the node, but it failed (the node rebooted). I couldn't spot anything in the long log, so I decided to upgrade the configuration to a 1.0 style as explained in the manual. I sent the upgraded (without errors) config to the node with cibadmin -R, and once again the node rebooted. This time I carefully read the log and found that the Policy Engine was crashing with a floating point exception: Feb 25 18:16:47 debian2 crmd: [2278]: info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1235582207-14, seq=1, quorate=1 Feb 25 18:16:47 debian2 crmd: [2278]: WARN: Managed pengine process 2284 killed by signal 8 [SIGFPE - Floating-point exception]. Feb 25 18:16:47 debian2 crmd: [2278]: ERROR: Managed pengine process 2284 dumped core Feb 25 18:16:47 debian2 crmd: [2278]: info: crmdManagedChildDied: Process pengine:[2284] exited (signal=8, exitcode=0) Feb 25 18:16:47 debian2 crmd: [2278]: info: pe_msg_dispatch: Received HUP from pengine:[2284] Feb 25 18:16:47 debian2 crmd: [2278]: CRIT: pe_connection_destroy: Connection to the Policy Engine failed (pid=2284, uuid=e8c74b56-035f-4eba-9e94-f0cbf6e461af) Feb 25 18:16:47 debian2 crmd: [2278]: info: pe_msg_dispatch: Received HUP from pengine:[-1] Feb 25 18:16:47 debian2 crmd: [2278]: CRIT: pe_connection_destroy: Connection to the Policy Engine failed (pid=-1, uuid=365758b2-6b77-4a9d-8f3e-04a109db1c4d) Feb 25 18:16:47 debian2 crmd: [2278]: notice: save_cib_contents: Saved CIB contents after PE crash to /var/lib/heartbeat/pengine/pe-core-e8c74b56-035f-4eba-9e94-f0cbf6e461af.bz2 Feb 25 18:16:47 debian2 crmd: [2278]: notice: save_cib_contents: Saved CIB contents after PE crash to /var/lib/heartbeat/pengine/pe-core-365758b2-6b77-4a9d-8f3e-04a109db1c4d.bz2 I have a core file, the archived cib, the before and after conversion cib, the full unedited logs... The pengine process crashed in: common_unpack I can reproduce the crash as soon as I put this cib in place. Certainly there is something wrong in this cib. Where should I file a complete bug-report? Thanks, -- Brice Figureau My Blog: http://www.masterzen.fr/ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
