[ 
https://issues.apache.org/jira/browse/HAWQ-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217624#comment-15217624
 ] 

ASF GitHub Bot commented on HAWQ-592:
-------------------------------------

Github user ztao1987 commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/532#issuecomment-203309240
  
    +1


> QD fails when connects to QE again in executormgr_allocate_any_executor()
> -------------------------------------------------------------------------
>
>                 Key: HAWQ-592
>                 URL: https://issues.apache.org/jira/browse/HAWQ-592
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Dispatcher
>    Affects Versions: 2.0.0
>            Reporter: Chunling Wang
>            Assignee: Lili Ma
>
> We first run a query to get some QEs. Then we kill one and run "set 
> log_min_messages=DEBUG1" to let QD get executormgr_allocate_any_executor(). 
> We find QD failed.
> 1. Run query to get some QEs.
> {code}
> dispatch=# select count(*) from test_dispatch as t1, test_dispatch as t2, 
> test_dispatch as t3 where t1.id *2 = t2.id and t1.id < t3.id;
>  count
> -------
>   3725
> (1 row)
> {code}
> {code}
> $ ps -ef|grep postgres
>   501 12817     1   0  4:41下午 ??         0:00.36 /usr/local/hawq/bin/postgres 
> -D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432 
> --silent-mode=true
>   501 12818 12817   0  4:41下午 ??         0:00.01 postgres: port  5432, master 
> logger process
>   501 12821 12817   0  4:41下午 ??         0:00.00 postgres: port  5432, stats 
> collector process
>   501 12822 12817   0  4:41下午 ??         0:00.03 postgres: port  5432, writer 
> process
>   501 12823 12817   0  4:41下午 ??         0:00.00 postgres: port  5432, 
> checkpoint process
>   501 12824 12817   0  4:41下午 ??         0:00.00 postgres: port  5432, 
> seqserver process
>   501 12825 12817   0  4:41下午 ??         0:00.00 postgres: port  5432, WAL 
> Send Server process
>   501 12826 12817   0  4:41下午 ??         0:00.00 postgres: port  5432, DFS 
> Metadata Cache process
>   501 12827 12817   0  4:41下午 ??         0:00.16 postgres: port  5432, master 
> resource manager
>   501 12844     1   0  4:41下午 ??         0:00.57 /usr/local/hawq/bin/postgres 
> -D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 40000 
> --silent-mode=true
>   501 12845 12844   0  4:41下午 ??         0:00.01 postgres: port 40000, logger 
> process
>   501 12856 12862   0  4:42下午 ??         0:00.05 postgres: port  5432, 
> wangchunling dispatch [local] con13 cmd10 idle [local]
>   501 12872 12844   0  4:42下午 ??         0:00.00 postgres: port 40000, stats 
> collector process
>   501 12873 12844   0  4:42下午 ??         0:00.01 postgres: port 40000, writer 
> process
>   501 12874 12844   0  4:42下午 ??         0:00.00 postgres: port 40000, 
> checkpoint process
>   501 12875 12844   0  4:42下午 ??         0:00.03 postgres: port 40000, 
> segment resource manager
> {code}
> 2. Kill -9 some QE and wait segment up.
> {code}
> $ ps -ef|grep postgres
>   501 12817     1   0  4:41下午 ??         0:00.91 /usr/local/hawq/bin/postgres 
> -D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432 
> --silent-mode=true
>   501 12818 12817   0  4:41下午 ??         0:00.05 postgres: port  5432, master 
> logger process
>   501 12844     1   0  4:41下午 ??         0:01.52 /usr/local/hawq/bin/postgres 
> -D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 40000 
> --silent-mode=true
>   501 12845 12844   0  4:41下午 ??         0:00.04 postgres: port 40000, logger 
> process
>   501 12872 12844   0  4:42下午 ??         0:00.02 postgres: port 40000, stats 
> collector process
>   501 12873 12844   0  4:42下午 ??         0:00.19 postgres: port 40000, writer 
> process
>   501 12874 12844   0  4:42下午 ??         0:00.03 postgres: port 40000, 
> checkpoint process
>   501 12875 12844   0  4:42下午 ??         0:00.41 postgres: port 40000, 
> segment resource manager
>   501 12932 12817   0  4:52下午 ??         0:00.00 postgres: port  5432, stats 
> collector process
>   501 12933 12817   0  4:52下午 ??         0:00.01 postgres: port  5432, writer 
> process
>   501 12934 12817   0  4:52下午 ??         0:00.00 postgres: port  5432, 
> checkpoint process
>   501 12935 12817   0  4:52下午 ??         0:00.00 postgres: port  5432, 
> seqserver process
>   501 12936 12817   0  4:52下午 ??         0:00.00 postgres: port  5432, WAL 
> Send Server process
>   501 12937 12817   0  4:52下午 ??         0:00.00 postgres: port  5432, DFS 
> Metadata Cache process
>   501 12938 12817   0  4:52下午 ??         0:00.04 postgres: port  5432, master 
> resource manager
>   501 12952 12817   0  4:53下午 ??         0:00.00 postgres: port  5432, 
> wangchunling dispatch [local] con30 idle [local]
> {code}
> {code}
> dispatch=# select * from gp_segment_configuration;
>  registration_order | role | status | port  |          hostname           |   
>         address           |            description
> --------------------+------+--------+-------+-----------------------------+-----------------------------+------------------------------------
>                   0 | m    | u      |  5432 | ChunlingdeMacBook-Pro.local | 
> ChunlingdeMacBook-Pro.local |
>                   1 | p    | d      | 40000 | localhost                   | 
> 127.0.0.1                   | resource manager process was reset
> (2 rows)
> dispatch=# select * from gp_segment_configuration;
>  registration_order | role | status | port  |          hostname           |   
>         address           | description
> --------------------+------+--------+-------+-----------------------------+-----------------------------+-------------
>                   0 | m    | u      |  5432 | ChunlingdeMacBook-Pro.local | 
> ChunlingdeMacBook-Pro.local |
>                   1 | p    | u      | 40000 | localhost                   | 
> 127.0.0.1                   |
> (2 rows)
> {code}
> 3. Run "set log_min_messages=DEBUG1" and find QD failed.
> {code}
> dispatch=# set log_min_messages=DEBUG1;
> The connection to the server was lost. Attempting reset: Failed.
> !>
> {code}
> The backtrace when QD fails:
> {code}
> * thread #1: tid = 0x2ff2e7, 0x00007fff87d60380 
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 64, queue = 
> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x00007fff87d60380 
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 64
>     frame #1: 0x00007fff8a0a82e2 libsystem_c.dylib`__memcpy_chk + 22
>     frame #2: 0x000000010c299469 postgres`CopySegment(src=0x0000000000000000, 
> cxt=0x00007fa303d07d90) + 137 at cdbutil.c:168
>     frame #3: 0x000000010c2c0df2 
> postgres`executormgr_prepare_connect(segment=0x0000000000000000, 
> is_writer='\x01') + 34 at executormgr.c:983
>     frame #4: 0x000000010c2bec26 
> postgres`dispmgt_build_preconnect_info(segment=0x0000000000000000, 
> is_writer='\x01', executor=0x00007fa3048787d8, data=0x00007fa304877e30, 
> slice=0x00007fa3048781a0, task=0x00007fa3048781c0) + 182 at 
> dispatcher_mgt.c:568
>     frame #5: 0x000000010c2b9ec4 
> postgres`dispatcher_bind_executor(data=0x00007fa304877e30) + 244 at 
> dispatcher.c:956
>     frame #6: 0x000000010c2b9bbb 
> postgres`dispatch_run(data=0x00007fa304877e30) + 219 at dispatcher.c:1237
>     frame #7: 0x000000010c2bb2bf 
> postgres`dispatch_statement(stmt=0x00007fff540b96e8, 
> resource=0x0000000000000000, result=0x0000000000000000) + 271 at 
> dispatcher.c:1491
>     frame #8: 0x000000010c2bb1a3 
> postgres`dispatch_statement_string(string=0x00007fa3068e4e40, 
> serializeQuerytree=0x0000000000000000, serializeLenQuerytree=0, 
> resource=0x0000000000000000, result=0x0000000000000000, 
> sync_on_all_executors='\x01') + 307 at dispatcher.c:1537
>     frame #9: 0x000000010c12aad9 
> postgres`SetPGVariableDispatch(name=0x00007fa30401c120, 
> args=0x00007fa30401c1d8, is_local='\0') + 713 at guc.c:10891
>     frame #10: 0x000000010bfe7e7e 
> postgres`ProcessUtility(parsetree=0x00007fa30401c208, 
> queryString=0x00007fa3068d3e30, params=0x0000000000000000, isTopLevel='\x01', 
> dest=0x00007fa30401c568, completionTag=0x00007fff540b9f90) + 8318 at 
> utility.c:1519
>     frame #11: 0x000000010bfe5810 
> postgres`PortalRunUtility(portal=0x00007fa304821430, 
> utilityStmt=0x00007fa30401c208, isTopLevel='\x01', dest=0x00007fa30401c568, 
> completionTag=0x00007fff540b9f90) + 464 at pquery.c:1896
>     frame #12: 0x000000010bfe3e4b 
> postgres`PortalRunMulti(portal=0x00007fa304821430, isTopLevel='\x01', 
> dest=0x00007fa30401c568, altdest=0x00007fa30401c568, 
> completionTag=0x00007fff540b9f90) + 539 at pquery.c:2006
>     frame #13: 0x000000010bfe33b5 
> postgres`PortalRun(portal=0x00007fa304821430, count=9223372036854775807, 
> isTopLevel='\x01', dest=0x00007fa30401c568, altdest=0x00007fa30401c568, 
> completionTag=0x00007fff540b9f90) + 1269 at pquery.c:1523
>     frame #14: 0x000000010bfd9703 
> postgres`exec_simple_query(query_string=0x00007fa30401b830, 
> seqServerHost=0x0000000000000000, seqServerPort=-1) + 2179 at postgres.c:1745
>     frame #15: 0x000000010bfd7b50 postgres`PostgresMain(argc=4, 
> argv=0x00007fa30680ba10, username=0x00007fa30680b9d0) + 7472 at 
> postgres.c:4754
>     frame #16: 0x000000010bf7bfd6 
> postgres`BackendRun(port=0x00007fa303c18c50) + 1014 at postmaster.c:5889
>     frame #17: 0x000000010bf7b121 
> postgres`BackendStartup(port=0x00007fa303c18c50) + 385 at postmaster.c:5484
>     frame #18: 0x000000010bf77d90 postgres`ServerLoop + 1312 at 
> postmaster.c:2163
>     frame #19: 0x000000010bf763d3 postgres`PostmasterMain(argc=9, 
> argv=0x00007fa303d07a60) + 4931 at postmaster.c:1454
>     frame #20: 0x000000010be80af2 postgres`main(argc=9, 
> argv=0x00007fa303d07a60) + 978 at main.c:226
>     frame #21: 0x00007fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x2ff2e8, 0x00007fff890355be libsystem_kernel.dylib`poll + 
> 10
>     frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
>     frame #1: 0x000000010c1e3fed 
> postgres`rxThreadFunc(arg=0x0000000000000000) + 317 at ic_udp.c:6251
>     frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
>     frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
>     frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #4: tid = 0x2ff41f, 0x00007fff890343f6 
> libsystem_kernel.dylib`__select + 10
>     frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10
>     frame #1: 0x000000010c2c6acb postgres`pg_usleep(microsec=1000000) + 91 at 
> pgsleep.c:43
>     frame #2: 0x000000010c1799ca 
> postgres`generateResourceRefreshHeartBeat(arg=0x00007fa303e03380) + 1482 at 
> rmcomm_QD2RM.c:1546
>     frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
>     frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
>     frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to