Hello, All.
I am testing heartbeat development tree(f565f764587d), and found that
pengine processes are fallen to defunct status. Its situation is that
1. run 2 servers as master/slave mode
2. at the first the pengine and tengine process are running on master server
3. boot slave node
4. heartbeat tries to move the tengine and pengine processes to the slave node
5. pengine process defunct occurs
In the source code crm/pengine/native.c PromoteRsc() function is
called and it calls CRM_CHECK macro(lib/crm/common/utils.c). But the
error occurs as following
pengine[10426]: 2007/07/10_14:44:14 ERROR: crm_abort: PromoteRsc:
Forked child 10441 to record non-fatal assert at native.c:1157 :
rsc->next_role == RSC_ROLE_MASTER
In my research the parent process does not wait for ending the child
process, and I made the patch and confirmed not to occur defunct by
applying it. But, I do not know whether good or bad as heartbeat's
feature. So, could you advise?
Best Regards
MATSUDA, Daiki
--- utils.c.orig 2007-07-10 15:11:05.000000000 +0900
+++ utils.c 2007-07-10 15:11:00.000000000 +0900
@@ -24,6 +24,7 @@
#include <sys/param.h>
#include <sys/types.h>
+#include <sys/wait.h>
#include <sys/stat.h>
#include <stdio.h>
#include <unistd.h>
@@ -1429,6 +1430,7 @@
const char *assert_condition, gboolean do_fork)
{
int pid = 0;
+ int status;
if(do_fork == FALSE) {
do_crm_log(LOG_ERR,
@@ -1454,6 +1456,7 @@
do_crm_log(LOG_ERR,
"%s: Forked child %d to record
non-fatal assert at %s:%d : %s",
function, pid, file, line, assert_condition);
+ wait(&status);
return;
case 0: /* Child */
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/