Kuien Liu created HAWQ-1529:
-------------------------------
Summary: "segment resource manager" will NOT exit when postmaster
died
Key: HAWQ-1529
URL: https://issues.apache.org/jira/browse/HAWQ-1529
Project: Apache HAWQ
Issue Type: Improvement
Components: Core
Reporter: Kuien Liu
Assignee: Radar Lei
If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster dies,
BUT "segment resource manager" and "logger process" are still alive and
flushing "WARNING" each 30s.
To my understanding, "logger process" is waiting for "segment resource
manager", but the resource manager will not detect the alive-status of
postmaster and continue waiting. Does it make sense? Why not quit in case of
postmaster gone?
The call stack of RM when postmaster is killed:
#0 0x00007f19023ccab6 in poll () from /lib64/libc.so.6
#1 0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156
#2 0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at
resourcemanager_RMSEG.c:166
#3 0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at
resourcemanager_RMSEG.c:71
#4 0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at
resourcemanager.c:346
#5 0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411
#6 0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673
#7 0x000000000089a562 in do_reaper () at postmaster.c:4021
#8 0x00000000008969bb in ServerLoop () at postmaster.c:2136
#9 0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at
postmaster.c:1454
#10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226
#11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000004bde89 in _start ()
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)