I've got a situation where heartbeat 2.1.4 has shutdown and created a core 
file. This particular situation was caused by heartbeat.c within its 
update_ackseq() function where the hist->lowseq is greater than hist->ackseq. 
This causes heartbeat to call abort(), causing SIGABRT and creating a core file 
(default signal action). Subsequently, heartbeat restarted and proceeded as 
normal.

Whenever any process creates a core file in our customer installations, this 
cause one or more alarms to be created, causing panic at the customer site. 
This is good, as long as there is a real problem that needs to be solved. 
Typically, the creation of a new core file at the customer site is a symptom of 
an issue that does require product support attention. Therefore, we run all of 
our processes in an environment where the creation of core files are permitted.

My question follows; "Is this a real problem situation that requires attention, 
or would it be worth considering having this section of code not create a core 
file"? I also see that there are other sections of code within heartbeat.c and 
hb_api.c that call abort() and create a core file. Are any of these locations 
an area that requires customer support attention, or is it typical to allow 
heartbeat to simply restart itself and proceed as normal?

 
I am considering creating a patch for heartbeat that adds a new keyword to the 
ha.cf config file that determines whether to call abort() or exit(-1) at these 
locations. We would set the configuration to call exit(-1) and not create a 
core file if, in fact, these particular situations are resonably managed by 
having heartbeat restart and proceed. Does this seem to be a resonable approach 
to handling this behavior at a customer site?

Joe Horvath
[email protected]


      
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to