[jira] [Resolved] (HAWQ-1338) In some case writer process crashed when running 'hawq stop cluster'

Ming LI (JIRA) Wed, 15 Feb 2017 23:41:14 -0800

     [ 
https://issues.apache.org/jira/browse/HAWQ-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ming LI resolved HAWQ-1338.
---------------------------
       Resolution: Fixed
    Fix Version/s: backlog

> In some case writer process crashed when running 'hawq stop cluster'
> --------------------------------------------------------------------
>
>                 Key: HAWQ-1338
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1338
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ming LI
>            Assignee: Ming LI
>             Fix For: backlog
>
>
> On master node of test machine, some process doesn't exit nicely, and core 
> dump after a while.
> {code}
> ------------------- The running log  -------------------------
> 2/12/17 11:33:59 PM PST: 
> ----------------------------------------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres/java processes are closed properly:
> 2/12/17 11:33:59 PM PST: 
> ----------------------------------------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres|java process is running on test1: 
> 2/12/17 11:33:59 PM PST: gpadmin    5279      1  0 22:53 ?        00:00:03 
> postgres: port 31000, master logger process                                   
>                                                                               
>   
> 2/12/17 11:33:59 PM PST: gpadmin    5283      1  0 22:53 ?        00:00:01 
> postgres: port 31000, writer process                                          
>                                                                               
>   
> 2/12/17 11:33:59 PM PST: root      23864     24  1 23:37 ?        00:00:01 
> /usr/libexec/abrt-hook-ccpp 6 18446744073709551615 5283 501 501 1486971433 
> postgres
> 2/12/17 11:33:59 PM PST: -------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres|java process is running on test2: 
> 2/12/17 11:33:59 PM PST: -------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres|java process is running on test3: 
> 2/12/17 11:33:59 PM PST: -------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres|java process is running on test4: 
> 2/12/17 11:33:59 PM PST: -------------------------------------
> 2/12/17 11:33:59 PM PST: Check if postgres|java process is running on test5: 
> 2/12/17 11:33:59 PM PST: -------------------------------------
> 2/12/17 11:33:59 PM PST: ERROR: Postgres process not closed on test1, please 
> check.
> 2/12/17 11:33:59 PM PST: 
> ----------------------------------------------------------------------
> ------------------- The call stack -------------------------
> (gdb) bt
> #0  0x00000032214325e5 in raise () from /lib64/libc.so.6
> #1  0x0000003221433dc5 in abort () from /lib64/libc.so.6
> #2  0x000000000096433a in errfinish (dummy=0) at elog.c:686
> #3  0x00000000009665bd in elog_finish (elevel=22, fmt=0xc53af0 "process is 
> dying from critical section") at elog.c:1463
> #4  0x000000000086c11d in proc_exit_prepare (code=1) at ipc.c:153
> #5  0x000000000086c0a9 in proc_exit (code=1) at ipc.c:93
> #6  0x0000000000964300 in errfinish (dummy=0) at elog.c:670
> #7  0x0000000000825121 in ServiceClientRead (serviceClient=0xfc73f0, 
> response=0x7fffb96842de, responseLen=1,
>     timeout=0x7fffb96842c0) at service.c:523
> #8  0x0000000000824f7b in ServiceClientReceiveResponse 
> (serviceClient=0xfc73f0, response=0x7fffb96842de, responseLen=1,
>     timeout=0x7fffb96842c0) at service.c:480
> #9  0x000000000082bce1 in WalSendServerClientReceiveResponse 
> (walSendResponse=0x7fffb96842de, timeout=0x7fffb96842c0)
>     at walsendserver.c:372
> #10 0x000000000051596d in XLogQDMirrorWaitForResponse (waitForever=0 '\000') 
> at xlog.c:1919
> #11 0x0000000000515c0c in XLogQDMirrorWrite (startidx=0, npages=1, 
> timeLineID=1, logId=0, logSeg=1, logOff=13729792)
>     at xlog.c:2005
> #12 0x0000000000516615 in XLogWrite (WriteRqst=..., flexible=0 '\000', 
> xlog_switch=0 '\000') at xlog.c:2354
> #13 0x0000000000516d68 in XLogFlush (record=...) at xlog.c:2572
> #14 0x0000000000522f88 in CreateCheckPoint (shutdown=1 '\001', force=1 
> '\001') at xlog.c:8136
> #15 0x000000000052277b in ShutdownXLOG (code=0, arg=0) at xlog.c:7865
> #16 0x0000000000821f42 in BackgroundWriterMain () at bgwriter.c:318
> #17 0x000000000059c9f1 in AuxiliaryProcessMain (argc=2, argv=0x7fffb9684b60) 
> at bootstrap.c:467
> #18 0x000000000083c7b0 in StartChildProcess (type=BgWriterProcess) at 
> postmaster.c:6836
> #19 0x0000000000838f39 in CommenceNormalOperations () at postmaster.c:3618
> #20 0x000000000083984a in do_reaper () at postmaster.c:4021
> #21 0x0000000000835e97 in ServerLoop () at postmaster.c:2136
> #22 0x000000000083500f in PostmasterMain (argc=9, argv=0x288bd10) at 
> postmaster.c:1454
> #23 0x00000000007612af in main (argc=9, argv=0x288bd10) at main.c:226
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HAWQ-1338) In some case writer process crashed when running 'hawq stop cluster'

Reply via email to