[
https://issues.apache.org/jira/browse/HAWQ-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lili Ma updated HAWQ-559:
-------------------------
Description:
When the first query finishes, the QE is still alive. Then we run the second
query. After the thread of QD is created and bind to QE but not send data to
QE, we kill this QE and find QD hangs.
Here is the backtrace when QD hangs:
{code}
* thread #1: tid = 0x1c4afd, 0x00007fff890355be libsystem_kernel.dylib`poll +
10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #1: 0x000000010745692c postgres`receiveChunksUDP [inlined]
udpSignalPoll + 42 at ic_udp.c:2882
frame #2: 0x0000000107456902 postgres`receiveChunksUDP + 26 at ic_udp.c:2715
frame #3: 0x00000001074568e8 postgres`receiveChunksUDP [inlined]
waitOnCondition(timeout_us=250000) + 82 at ic_udp.c:1599
frame #4: 0x0000000107456896
postgres`receiveChunksUDP(pTransportStates=0x00007ff2a381ae48,
pEntry=0x00007ff2a18f2230, motNodeID=<unavailable>,
srcRoute=0x00007fff58c0ce96, conn=<unavailable>, inTeardown='\0') + 726 at
ic_udp.c:4039
frame #5: 0x0000000107452a86 postgres`RecvTupleChunkFromAnyUDP [inlined]
RecvTupleChunkFromAnyUDP_Internal + 498 at ic_udp.c:4146
frame #6: 0x0000000107452894
postgres`RecvTupleChunkFromAnyUDP(mlStates=<unavailable>,
transportStates=<unavailable>, motNodeID=1, srcRoute=0x00007fff58c0ce96) + 100
at ic_udp.c:4167
frame #7: 0x0000000107442254 postgres`RecvTupleFrom [inlined]
processIncomingChunks(mlStates=0x00007ff2a3812a30,
transportStates=0x00007ff2a381ae48, motNodeID=1, srcRoute=<unavailable>) + 34
at cdbmotion.c:684
frame #8: 0x0000000107442232
postgres`RecvTupleFrom(mlStates=0x00007ff2a3812a30,
transportStates=<unavailable>, motNodeID=1, tup_i=0x00007fff58c0cf00,
srcRoute=-100) + 370 at cdbmotion.c:610
frame #9: 0x00000001071c8778 postgres`ExecMotion [inlined]
execMotionUnsortedReceiver(node=<unavailable>) + 57 at nodeMotion.c:466
frame #10: 0x00000001071c873f postgres`ExecMotion(node=<unavailable>) +
1071 at nodeMotion.c:298
frame #11: 0x00000001071a4835
postgres`ExecProcNode(node=0x00007ff2a38164b8) + 613 at execProcnode.c:999
frame #12: 0x00000001071b9f82 postgres`ExecAgg + 104 at nodeAgg.c:1163
frame #13: 0x00000001071b9f1a postgres`ExecAgg + 316 at nodeAgg.c:1693
frame #14: 0x00000001071b9dde postgres`ExecAgg(node=0x00007ff2a3815348) +
126 at nodeAgg.c:1138
frame #15: 0x00000001071a4803
postgres`ExecProcNode(node=0x00007ff2a3815348) + 563 at execProcnode.c:979
frame #16: 0x000000010719ecfd
postgres`ExecutePlan(estate=0x00007ff2a3814e30, planstate=0x00007ff2a3815348,
operation=CMD_SELECT, numberTuples=0, direction=<unavailable>,
dest=0x00007ff2a28db178) + 1181 at execMain.c:3218
frame #17: 0x000000010719e619
postgres`ExecutorRun(queryDesc=0x00007ff2a3811f00,
direction=ForwardScanDirection, count=0) + 569 at execMain.c:1213
frame #18: 0x00000001072e7fc2 postgres`PortalRun + 14 at pquery.c:1649
frame #19: 0x00000001072e7fb4 postgres`PortalRun(portal=0x00007ff2a1893e30,
count=<unavailable>, isTopLevel='\x01', dest=<unavailable>,
altdest=0x00007ff2a28db178, completionTag=0x00007fff58c0d530) + 1124 at
pquery.c:1471
frame #20: 0x00000001072e4a8e
postgres`exec_simple_query(query_string=0x00007ff2a380fe30,
seqServerHost=0x0000000000000000, seqServerPort=-1) + 2078 at postgres.c:1745
frame #21: 0x00000001072e0c4c postgres`PostgresMain(argc=<unavailable>,
argv=<unavailable>, username=0x00007ff2a201bcf0) + 9404 at postgres.c:4754
frame #22: 0x000000010729a002 postgres`ServerLoop [inlined] BackendRun +
105 at postmaster.c:5889
frame #23: 0x0000000107299f99 postgres`ServerLoop at postmaster.c:5484
frame #24: 0x0000000107299f99 postgres`ServerLoop + 9593 at
postmaster.c:2163
frame #25: 0x0000000107296f3b postgres`PostmasterMain(argc=<unavailable>,
argv=<unavailable>) + 5019 at postmaster.c:1454
frame #26: 0x0000000107200ca9 postgres`main(argc=9,
argv=0x00007ff2a141eef0) + 1433 at main.c:209
frame #27: 0x00007fff95e8c5c9 libdyld.dylib`start + 1
thread #2: tid = 0x1c4afe, 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #1: 0x000000010744d8e3 postgres`rxThreadFunc(arg=<unavailable>) +
2163 at ic_udp.c:6251
frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
thread #3: tid = 0x1c4b02, 0x00007fff890343f6 libsystem_kernel.dylib`__select
+ 10
frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10
frame #1: 0x00000001074ec47e postgres`pg_usleep(microsec=<unavailable>) +
78 at pgsleep.c:43
frame #2: 0x0000000107400c26
postgres`generateResourceRefreshHeartBeat(arg=0x00007ff2a141ce90) + 166 at
rmcomm_QD2RM.c:1519
frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
{code}
was:
When the first query finishes, the QE is still alive. Then we run the second
query. After the thread of QD is created and bind to QE but not send data to
QE, we kill this QE and find QD hangs.
Here is the backtrace when QD hangs:
* thread #1: tid = 0x1c4afd, 0x00007fff890355be libsystem_kernel.dylib`poll +
10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #1: 0x000000010745692c postgres`receiveChunksUDP [inlined]
udpSignalPoll + 42 at ic_udp.c:2882
frame #2: 0x0000000107456902 postgres`receiveChunksUDP + 26 at ic_udp.c:2715
frame #3: 0x00000001074568e8 postgres`receiveChunksUDP [inlined]
waitOnCondition(timeout_us=250000) + 82 at ic_udp.c:1599
frame #4: 0x0000000107456896
postgres`receiveChunksUDP(pTransportStates=0x00007ff2a381ae48,
pEntry=0x00007ff2a18f2230, motNodeID=<unavailable>,
srcRoute=0x00007fff58c0ce96, conn=<unavailable>, inTeardown='\0') + 726 at
ic_udp.c:4039
frame #5: 0x0000000107452a86 postgres`RecvTupleChunkFromAnyUDP [inlined]
RecvTupleChunkFromAnyUDP_Internal + 498 at ic_udp.c:4146
frame #6: 0x0000000107452894
postgres`RecvTupleChunkFromAnyUDP(mlStates=<unavailable>,
transportStates=<unavailable>, motNodeID=1, srcRoute=0x00007fff58c0ce96) + 100
at ic_udp.c:4167
frame #7: 0x0000000107442254 postgres`RecvTupleFrom [inlined]
processIncomingChunks(mlStates=0x00007ff2a3812a30,
transportStates=0x00007ff2a381ae48, motNodeID=1, srcRoute=<unavailable>) + 34
at cdbmotion.c:684
frame #8: 0x0000000107442232
postgres`RecvTupleFrom(mlStates=0x00007ff2a3812a30,
transportStates=<unavailable>, motNodeID=1, tup_i=0x00007fff58c0cf00,
srcRoute=-100) + 370 at cdbmotion.c:610
frame #9: 0x00000001071c8778 postgres`ExecMotion [inlined]
execMotionUnsortedReceiver(node=<unavailable>) + 57 at nodeMotion.c:466
frame #10: 0x00000001071c873f postgres`ExecMotion(node=<unavailable>) +
1071 at nodeMotion.c:298
frame #11: 0x00000001071a4835
postgres`ExecProcNode(node=0x00007ff2a38164b8) + 613 at execProcnode.c:999
frame #12: 0x00000001071b9f82 postgres`ExecAgg + 104 at nodeAgg.c:1163
frame #13: 0x00000001071b9f1a postgres`ExecAgg + 316 at nodeAgg.c:1693
frame #14: 0x00000001071b9dde postgres`ExecAgg(node=0x00007ff2a3815348) +
126 at nodeAgg.c:1138
frame #15: 0x00000001071a4803
postgres`ExecProcNode(node=0x00007ff2a3815348) + 563 at execProcnode.c:979
frame #16: 0x000000010719ecfd
postgres`ExecutePlan(estate=0x00007ff2a3814e30, planstate=0x00007ff2a3815348,
operation=CMD_SELECT, numberTuples=0, direction=<unavailable>,
dest=0x00007ff2a28db178) + 1181 at execMain.c:3218
frame #17: 0x000000010719e619
postgres`ExecutorRun(queryDesc=0x00007ff2a3811f00,
direction=ForwardScanDirection, count=0) + 569 at execMain.c:1213
frame #18: 0x00000001072e7fc2 postgres`PortalRun + 14 at pquery.c:1649
frame #19: 0x00000001072e7fb4 postgres`PortalRun(portal=0x00007ff2a1893e30,
count=<unavailable>, isTopLevel='\x01', dest=<unavailable>,
altdest=0x00007ff2a28db178, completionTag=0x00007fff58c0d530) + 1124 at
pquery.c:1471
frame #20: 0x00000001072e4a8e
postgres`exec_simple_query(query_string=0x00007ff2a380fe30,
seqServerHost=0x0000000000000000, seqServerPort=-1) + 2078 at postgres.c:1745
frame #21: 0x00000001072e0c4c postgres`PostgresMain(argc=<unavailable>,
argv=<unavailable>, username=0x00007ff2a201bcf0) + 9404 at postgres.c:4754
frame #22: 0x000000010729a002 postgres`ServerLoop [inlined] BackendRun +
105 at postmaster.c:5889
frame #23: 0x0000000107299f99 postgres`ServerLoop at postmaster.c:5484
frame #24: 0x0000000107299f99 postgres`ServerLoop + 9593 at
postmaster.c:2163
frame #25: 0x0000000107296f3b postgres`PostmasterMain(argc=<unavailable>,
argv=<unavailable>) + 5019 at postmaster.c:1454
frame #26: 0x0000000107200ca9 postgres`main(argc=9,
argv=0x00007ff2a141eef0) + 1433 at main.c:209
frame #27: 0x00007fff95e8c5c9 libdyld.dylib`start + 1
thread #2: tid = 0x1c4afe, 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
frame #1: 0x000000010744d8e3 postgres`rxThreadFunc(arg=<unavailable>) +
2163 at ic_udp.c:6251
frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
thread #3: tid = 0x1c4b02, 0x00007fff890343f6 libsystem_kernel.dylib`__select
+ 10
frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10
frame #1: 0x00000001074ec47e postgres`pg_usleep(microsec=<unavailable>) +
78 at pgsleep.c:43
frame #2: 0x0000000107400c26
postgres`generateResourceRefreshHeartBeat(arg=0x00007ff2a141ce90) + 166 at
rmcomm_QD2RM.c:1519
frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> QD hangs when QE is killed after connected to QD
> ------------------------------------------------
>
> Key: HAWQ-559
> URL: https://issues.apache.org/jira/browse/HAWQ-559
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Dispatcher
> Affects Versions: 2.0.0
> Environment: mac os X 10.10
> Reporter: Chunling Wang
> Assignee: Lili Ma
>
> When the first query finishes, the QE is still alive. Then we run the second
> query. After the thread of QD is created and bind to QE but not send data to
> QE, we kill this QE and find QD hangs.
> Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x1c4afd, 0x00007fff890355be libsystem_kernel.dylib`poll +
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
> * frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000000010745692c postgres`receiveChunksUDP [inlined]
> udpSignalPoll + 42 at ic_udp.c:2882
> frame #2: 0x0000000107456902 postgres`receiveChunksUDP + 26 at
> ic_udp.c:2715
> frame #3: 0x00000001074568e8 postgres`receiveChunksUDP [inlined]
> waitOnCondition(timeout_us=250000) + 82 at ic_udp.c:1599
> frame #4: 0x0000000107456896
> postgres`receiveChunksUDP(pTransportStates=0x00007ff2a381ae48,
> pEntry=0x00007ff2a18f2230, motNodeID=<unavailable>,
> srcRoute=0x00007fff58c0ce96, conn=<unavailable>, inTeardown='\0') + 726 at
> ic_udp.c:4039
> frame #5: 0x0000000107452a86 postgres`RecvTupleChunkFromAnyUDP [inlined]
> RecvTupleChunkFromAnyUDP_Internal + 498 at ic_udp.c:4146
> frame #6: 0x0000000107452894
> postgres`RecvTupleChunkFromAnyUDP(mlStates=<unavailable>,
> transportStates=<unavailable>, motNodeID=1, srcRoute=0x00007fff58c0ce96) +
> 100 at ic_udp.c:4167
> frame #7: 0x0000000107442254 postgres`RecvTupleFrom [inlined]
> processIncomingChunks(mlStates=0x00007ff2a3812a30,
> transportStates=0x00007ff2a381ae48, motNodeID=1, srcRoute=<unavailable>) + 34
> at cdbmotion.c:684
> frame #8: 0x0000000107442232
> postgres`RecvTupleFrom(mlStates=0x00007ff2a3812a30,
> transportStates=<unavailable>, motNodeID=1, tup_i=0x00007fff58c0cf00,
> srcRoute=-100) + 370 at cdbmotion.c:610
> frame #9: 0x00000001071c8778 postgres`ExecMotion [inlined]
> execMotionUnsortedReceiver(node=<unavailable>) + 57 at nodeMotion.c:466
> frame #10: 0x00000001071c873f postgres`ExecMotion(node=<unavailable>) +
> 1071 at nodeMotion.c:298
> frame #11: 0x00000001071a4835
> postgres`ExecProcNode(node=0x00007ff2a38164b8) + 613 at execProcnode.c:999
> frame #12: 0x00000001071b9f82 postgres`ExecAgg + 104 at nodeAgg.c:1163
> frame #13: 0x00000001071b9f1a postgres`ExecAgg + 316 at nodeAgg.c:1693
> frame #14: 0x00000001071b9dde postgres`ExecAgg(node=0x00007ff2a3815348) +
> 126 at nodeAgg.c:1138
> frame #15: 0x00000001071a4803
> postgres`ExecProcNode(node=0x00007ff2a3815348) + 563 at execProcnode.c:979
> frame #16: 0x000000010719ecfd
> postgres`ExecutePlan(estate=0x00007ff2a3814e30, planstate=0x00007ff2a3815348,
> operation=CMD_SELECT, numberTuples=0, direction=<unavailable>,
> dest=0x00007ff2a28db178) + 1181 at execMain.c:3218
> frame #17: 0x000000010719e619
> postgres`ExecutorRun(queryDesc=0x00007ff2a3811f00,
> direction=ForwardScanDirection, count=0) + 569 at execMain.c:1213
> frame #18: 0x00000001072e7fc2 postgres`PortalRun + 14 at pquery.c:1649
> frame #19: 0x00000001072e7fb4
> postgres`PortalRun(portal=0x00007ff2a1893e30, count=<unavailable>,
> isTopLevel='\x01', dest=<unavailable>, altdest=0x00007ff2a28db178,
> completionTag=0x00007fff58c0d530) + 1124 at pquery.c:1471
> frame #20: 0x00000001072e4a8e
> postgres`exec_simple_query(query_string=0x00007ff2a380fe30,
> seqServerHost=0x0000000000000000, seqServerPort=-1) + 2078 at postgres.c:1745
> frame #21: 0x00000001072e0c4c postgres`PostgresMain(argc=<unavailable>,
> argv=<unavailable>, username=0x00007ff2a201bcf0) + 9404 at postgres.c:4754
> frame #22: 0x000000010729a002 postgres`ServerLoop [inlined] BackendRun +
> 105 at postmaster.c:5889
> frame #23: 0x0000000107299f99 postgres`ServerLoop at postmaster.c:5484
> frame #24: 0x0000000107299f99 postgres`ServerLoop + 9593 at
> postmaster.c:2163
> frame #25: 0x0000000107296f3b postgres`PostmasterMain(argc=<unavailable>,
> argv=<unavailable>) + 5019 at postmaster.c:1454
> frame #26: 0x0000000107200ca9 postgres`main(argc=9,
> argv=0x00007ff2a141eef0) + 1433 at main.c:209
> frame #27: 0x00007fff95e8c5c9 libdyld.dylib`start + 1
> thread #2: tid = 0x1c4afe, 0x00007fff890355be libsystem_kernel.dylib`poll +
> 10
> frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000000010744d8e3 postgres`rxThreadFunc(arg=<unavailable>) +
> 2163 at ic_udp.c:6251
> frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> thread #3: tid = 0x1c4b02, 0x00007fff890343f6
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x00000001074ec47e postgres`pg_usleep(microsec=<unavailable>) +
> 78 at pgsleep.c:43
> frame #2: 0x0000000107400c26
> postgres`generateResourceRefreshHeartBeat(arg=0x00007ff2a141ce90) + 166 at
> rmcomm_QD2RM.c:1519
> frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)