Hi,

I'm currently running Pacemaker 0.6.5 and I'm preparing an upgrade
toward 1.0-tip (currently 0de73ec89e02) with packages I compiled myself
on debian lenny amd64.

To check everything works before applying the upgrade on my production
environment, I created two VMs and (partly) replicated my cluster on
those two machines.

To upgrade I shutdowned heartbeat, on both member, following one of the
strategy outlined in the "Configuration Explained" document.

I then performed the upgrade, and restarted heartbeat on one of the
node, but it failed (the node rebooted).
I couldn't spot anything in the long log, so I decided to upgrade the
configuration to a 1.0 style as explained in the manual.

I sent the upgraded (without errors) config to the node with cibadmin
-R, and once again the node rebooted.

This time I carefully read the log and found that the Policy Engine was
crashing with a floating point exception:

Feb 25 18:16:47 debian2 crmd: [2278]: info: do_pe_invoke_callback: Invoking the 
PE: ref=pe_calc-dc-1235582207-14, seq=1, quorate=1
Feb 25 18:16:47 debian2 crmd: [2278]: WARN: Managed pengine process 2284 killed 
by signal 8 [SIGFPE - Floating-point exception].
Feb 25 18:16:47 debian2 crmd: [2278]: ERROR: Managed pengine process 2284 
dumped core
Feb 25 18:16:47 debian2 crmd: [2278]: info: crmdManagedChildDied: Process 
pengine:[2284] exited (signal=8, exitcode=0)
Feb 25 18:16:47 debian2 crmd: [2278]: info: pe_msg_dispatch: Received HUP from 
pengine:[2284]
Feb 25 18:16:47 debian2 crmd: [2278]: CRIT: pe_connection_destroy: Connection 
to the Policy Engine failed (pid=2284, 
uuid=e8c74b56-035f-4eba-9e94-f0cbf6e461af)
Feb 25 18:16:47 debian2 crmd: [2278]: info: pe_msg_dispatch: Received HUP from 
pengine:[-1]
Feb 25 18:16:47 debian2 crmd: [2278]: CRIT: pe_connection_destroy: Connection 
to the Policy Engine failed (pid=-1, uuid=365758b2-6b77-4a9d-8f3e-04a109db1c4d)
Feb 25 18:16:47 debian2 crmd: [2278]: notice: save_cib_contents: Saved CIB 
contents after PE crash to 
/var/lib/heartbeat/pengine/pe-core-e8c74b56-035f-4eba-9e94-f0cbf6e461af.bz2
Feb 25 18:16:47 debian2 crmd: [2278]: notice: save_cib_contents: Saved CIB 
contents after PE crash to 
/var/lib/heartbeat/pengine/pe-core-365758b2-6b77-4a9d-8f3e-04a109db1c4d.bz2

I have a core file, the archived cib, the before and after conversion cib, the 
full unedited logs...
The pengine process crashed in: common_unpack
I can reproduce the crash as soon as I put this cib in place. Certainly there 
is something wrong in this cib.

Where should I file a complete bug-report?

Thanks,
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to