Kuien Liu created HAWQ-1529: ------------------------------- Summary: "segment resource manager" will NOT exit when postmaster died Key: HAWQ-1529 URL: https://issues.apache.org/jira/browse/HAWQ-1529 Project: Apache HAWQ Issue Type: Improvement Components: Core Reporter: Kuien Liu Assignee: Radar Lei
If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster dies, BUT "segment resource manager" and "logger process" are still alive and flushing "WARNING" each 30s. To my understanding, "logger process" is waiting for "segment resource manager", but the resource manager will not detect the alive-status of postmaster and continue waiting. Does it make sense? Why not quit in case of postmaster gone? The call stack of RM when postmaster is killed: #0 0x00007f19023ccab6 in poll () from /lib64/libc.so.6 #1 0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 #2 0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at resourcemanager_RMSEG.c:166 #3 0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at resourcemanager_RMSEG.c:71 #4 0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at resourcemanager.c:346 #5 0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 #6 0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673 #7 0x000000000089a562 in do_reaper () at postmaster.c:4021 #8 0x00000000008969bb in ServerLoop () at postmaster.c:2136 #9 0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at postmaster.c:1454 #10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 #11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6 #12 0x00000000004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)