[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178608#comment-16178608 ]
Kuien Liu edited comment on HAWQ-1529 at 9/25/17 8:31 AM: ---------------------------------------------------------- A possible patch looks strange but does work. {code:diff} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) {code} was (Author: kuien): A possible patch looks strange but does work. {code:diff} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) @@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void) DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec(); while( DRMGlobalInstance->ResManagerMainKeepRun ) { - if (!PostmasterIsAlive(true)) { + if (0 == PostmasterIsAlive(true)) { DRMGlobalInstance->ResManagerMainKeepRun = false; elog(LOG, "Postmaster is not alive, resource manager exits"); break; {code} > "segment resource manager" will NOT exit when postmaster died > ------------------------------------------------------------- > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core > Reporter: Kuien Liu > Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x00007f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x000000000089a562 in do_reaper () at postmaster.c:4021 > #8 0x00000000008969bb in ServerLoop () at postmaster.c:2136 > #9 0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x00000000004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)