I've got a situation where heartbeat 2.1.4 has shutdown and created a core file. This particular situation was caused by heartbeat.c within its update_ackseq() function where the hist->lowseq is greater than hist->ackseq. This causes heartbeat to call abort(), causing SIGABRT and creating a core file (default signal action). Subsequently, heartbeat restarted and proceeded as normal.
Whenever any process creates a core file in our customer installations, this cause one or more alarms to be created, causing panic at the customer site. This is good, as long as there is a real problem that needs to be solved. Typically, the creation of a new core file at the customer site is a symptom of an issue that does require product support attention. Therefore, we run all of our processes in an environment where the creation of core files are permitted. My question follows; "Is this a real problem situation that requires attention, or would it be worth considering having this section of code not create a core file"? I also see that there are other sections of code within heartbeat.c and hb_api.c that call abort() and create a core file. Are any of these locations an area that requires customer support attention, or is it typical to allow heartbeat to simply restart itself and proceed as normal? I am considering creating a patch for heartbeat that adds a new keyword to the ha.cf config file that determines whether to call abort() or exit(-1) at these locations. We would set the configuration to call exit(-1) and not create a core file if, in fact, these particular situations are resonably managed by having heartbeat restart and proceed. Does this seem to be a resonable approach to handling this behavior at a customer site? Joe Horvath [email protected] _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
