[
https://issues.apache.org/jira/browse/HAWQ-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217624#comment-15217624
]
ASF GitHub Bot commented on HAWQ-592:
-------------------------------------
Github user ztao1987 commented on the pull request:
https://github.com/apache/incubator-hawq/pull/532#issuecomment-203309240
+1
> QD fails when connects to QE again in executormgr_allocate_any_executor()
> -------------------------------------------------------------------------
>
> Key: HAWQ-592
> URL: https://issues.apache.org/jira/browse/HAWQ-592
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Dispatcher
> Affects Versions: 2.0.0
> Reporter: Chunling Wang
> Assignee: Lili Ma
>
> We first run a query to get some QEs. Then we kill one and run "set
> log_min_messages=DEBUG1" to let QD get executormgr_allocate_any_executor().
> We find QD failed.
> 1. Run query to get some QEs.
> {code}
> dispatch=# select count(*) from test_dispatch as t1, test_dispatch as t2,
> test_dispatch as t3 where t1.id *2 = t2.id and t1.id < t3.id;
> count
> -------
> 3725
> (1 row)
> {code}
> {code}
> $ ps -ef|grep postgres
> 501 12817 1 0 4:41下午 ?? 0:00.36 /usr/local/hawq/bin/postgres
> -D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432
> --silent-mode=true
> 501 12818 12817 0 4:41下午 ?? 0:00.01 postgres: port 5432, master
> logger process
> 501 12821 12817 0 4:41下午 ?? 0:00.00 postgres: port 5432, stats
> collector process
> 501 12822 12817 0 4:41下午 ?? 0:00.03 postgres: port 5432, writer
> process
> 501 12823 12817 0 4:41下午 ?? 0:00.00 postgres: port 5432,
> checkpoint process
> 501 12824 12817 0 4:41下午 ?? 0:00.00 postgres: port 5432,
> seqserver process
> 501 12825 12817 0 4:41下午 ?? 0:00.00 postgres: port 5432, WAL
> Send Server process
> 501 12826 12817 0 4:41下午 ?? 0:00.00 postgres: port 5432, DFS
> Metadata Cache process
> 501 12827 12817 0 4:41下午 ?? 0:00.16 postgres: port 5432, master
> resource manager
> 501 12844 1 0 4:41下午 ?? 0:00.57 /usr/local/hawq/bin/postgres
> -D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 40000
> --silent-mode=true
> 501 12845 12844 0 4:41下午 ?? 0:00.01 postgres: port 40000, logger
> process
> 501 12856 12862 0 4:42下午 ?? 0:00.05 postgres: port 5432,
> wangchunling dispatch [local] con13 cmd10 idle [local]
> 501 12872 12844 0 4:42下午 ?? 0:00.00 postgres: port 40000, stats
> collector process
> 501 12873 12844 0 4:42下午 ?? 0:00.01 postgres: port 40000, writer
> process
> 501 12874 12844 0 4:42下午 ?? 0:00.00 postgres: port 40000,
> checkpoint process
> 501 12875 12844 0 4:42下午 ?? 0:00.03 postgres: port 40000,
> segment resource manager
> {code}
> 2. Kill -9 some QE and wait segment up.
> {code}
> $ ps -ef|grep postgres
> 501 12817 1 0 4:41下午 ?? 0:00.91 /usr/local/hawq/bin/postgres
> -D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432
> --silent-mode=true
> 501 12818 12817 0 4:41下午 ?? 0:00.05 postgres: port 5432, master
> logger process
> 501 12844 1 0 4:41下午 ?? 0:01.52 /usr/local/hawq/bin/postgres
> -D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 40000
> --silent-mode=true
> 501 12845 12844 0 4:41下午 ?? 0:00.04 postgres: port 40000, logger
> process
> 501 12872 12844 0 4:42下午 ?? 0:00.02 postgres: port 40000, stats
> collector process
> 501 12873 12844 0 4:42下午 ?? 0:00.19 postgres: port 40000, writer
> process
> 501 12874 12844 0 4:42下午 ?? 0:00.03 postgres: port 40000,
> checkpoint process
> 501 12875 12844 0 4:42下午 ?? 0:00.41 postgres: port 40000,
> segment resource manager
> 501 12932 12817 0 4:52下午 ?? 0:00.00 postgres: port 5432, stats
> collector process
> 501 12933 12817 0 4:52下午 ?? 0:00.01 postgres: port 5432, writer
> process
> 501 12934 12817 0 4:52下午 ?? 0:00.00 postgres: port 5432,
> checkpoint process
> 501 12935 12817 0 4:52下午 ?? 0:00.00 postgres: port 5432,
> seqserver process
> 501 12936 12817 0 4:52下午 ?? 0:00.00 postgres: port 5432, WAL
> Send Server process
> 501 12937 12817 0 4:52下午 ?? 0:00.00 postgres: port 5432, DFS
> Metadata Cache process
> 501 12938 12817 0 4:52下午 ?? 0:00.04 postgres: port 5432, master
> resource manager
> 501 12952 12817 0 4:53下午 ?? 0:00.00 postgres: port 5432,
> wangchunling dispatch [local] con30 idle [local]
> {code}
> {code}
> dispatch=# select * from gp_segment_configuration;
> registration_order | role | status | port | hostname |
> address | description
> --------------------+------+--------+-------+-----------------------------+-----------------------------+------------------------------------
> 0 | m | u | 5432 | ChunlingdeMacBook-Pro.local |
> ChunlingdeMacBook-Pro.local |
> 1 | p | d | 40000 | localhost |
> 127.0.0.1 | resource manager process was reset
> (2 rows)
> dispatch=# select * from gp_segment_configuration;
> registration_order | role | status | port | hostname |
> address | description
> --------------------+------+--------+-------+-----------------------------+-----------------------------+-------------
> 0 | m | u | 5432 | ChunlingdeMacBook-Pro.local |
> ChunlingdeMacBook-Pro.local |
> 1 | p | u | 40000 | localhost |
> 127.0.0.1 |
> (2 rows)
> {code}
> 3. Run "set log_min_messages=DEBUG1" and find QD failed.
> {code}
> dispatch=# set log_min_messages=DEBUG1;
> The connection to the server was lost. Attempting reset: Failed.
> !>
> {code}
> The backtrace when QD fails:
> {code}
> * thread #1: tid = 0x2ff2e7, 0x00007fff87d60380
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 64, queue =
> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
> * frame #0: 0x00007fff87d60380
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 64
> frame #1: 0x00007fff8a0a82e2 libsystem_c.dylib`__memcpy_chk + 22
> frame #2: 0x000000010c299469 postgres`CopySegment(src=0x0000000000000000,
> cxt=0x00007fa303d07d90) + 137 at cdbutil.c:168
> frame #3: 0x000000010c2c0df2
> postgres`executormgr_prepare_connect(segment=0x0000000000000000,
> is_writer='\x01') + 34 at executormgr.c:983
> frame #4: 0x000000010c2bec26
> postgres`dispmgt_build_preconnect_info(segment=0x0000000000000000,
> is_writer='\x01', executor=0x00007fa3048787d8, data=0x00007fa304877e30,
> slice=0x00007fa3048781a0, task=0x00007fa3048781c0) + 182 at
> dispatcher_mgt.c:568
> frame #5: 0x000000010c2b9ec4
> postgres`dispatcher_bind_executor(data=0x00007fa304877e30) + 244 at
> dispatcher.c:956
> frame #6: 0x000000010c2b9bbb
> postgres`dispatch_run(data=0x00007fa304877e30) + 219 at dispatcher.c:1237
> frame #7: 0x000000010c2bb2bf
> postgres`dispatch_statement(stmt=0x00007fff540b96e8,
> resource=0x0000000000000000, result=0x0000000000000000) + 271 at
> dispatcher.c:1491
> frame #8: 0x000000010c2bb1a3
> postgres`dispatch_statement_string(string=0x00007fa3068e4e40,
> serializeQuerytree=0x0000000000000000, serializeLenQuerytree=0,
> resource=0x0000000000000000, result=0x0000000000000000,
> sync_on_all_executors='\x01') + 307 at dispatcher.c:1537
> frame #9: 0x000000010c12aad9
> postgres`SetPGVariableDispatch(name=0x00007fa30401c120,
> args=0x00007fa30401c1d8, is_local='\0') + 713 at guc.c:10891
> frame #10: 0x000000010bfe7e7e
> postgres`ProcessUtility(parsetree=0x00007fa30401c208,
> queryString=0x00007fa3068d3e30, params=0x0000000000000000, isTopLevel='\x01',
> dest=0x00007fa30401c568, completionTag=0x00007fff540b9f90) + 8318 at
> utility.c:1519
> frame #11: 0x000000010bfe5810
> postgres`PortalRunUtility(portal=0x00007fa304821430,
> utilityStmt=0x00007fa30401c208, isTopLevel='\x01', dest=0x00007fa30401c568,
> completionTag=0x00007fff540b9f90) + 464 at pquery.c:1896
> frame #12: 0x000000010bfe3e4b
> postgres`PortalRunMulti(portal=0x00007fa304821430, isTopLevel='\x01',
> dest=0x00007fa30401c568, altdest=0x00007fa30401c568,
> completionTag=0x00007fff540b9f90) + 539 at pquery.c:2006
> frame #13: 0x000000010bfe33b5
> postgres`PortalRun(portal=0x00007fa304821430, count=9223372036854775807,
> isTopLevel='\x01', dest=0x00007fa30401c568, altdest=0x00007fa30401c568,
> completionTag=0x00007fff540b9f90) + 1269 at pquery.c:1523
> frame #14: 0x000000010bfd9703
> postgres`exec_simple_query(query_string=0x00007fa30401b830,
> seqServerHost=0x0000000000000000, seqServerPort=-1) + 2179 at postgres.c:1745
> frame #15: 0x000000010bfd7b50 postgres`PostgresMain(argc=4,
> argv=0x00007fa30680ba10, username=0x00007fa30680b9d0) + 7472 at
> postgres.c:4754
> frame #16: 0x000000010bf7bfd6
> postgres`BackendRun(port=0x00007fa303c18c50) + 1014 at postmaster.c:5889
> frame #17: 0x000000010bf7b121
> postgres`BackendStartup(port=0x00007fa303c18c50) + 385 at postmaster.c:5484
> frame #18: 0x000000010bf77d90 postgres`ServerLoop + 1312 at
> postmaster.c:2163
> frame #19: 0x000000010bf763d3 postgres`PostmasterMain(argc=9,
> argv=0x00007fa303d07a60) + 4931 at postmaster.c:1454
> frame #20: 0x000000010be80af2 postgres`main(argc=9,
> argv=0x00007fa303d07a60) + 978 at main.c:226
> frame #21: 0x00007fff95e8c5c9 libdyld.dylib`start + 1
> thread #2: tid = 0x2ff2e8, 0x00007fff890355be libsystem_kernel.dylib`poll +
> 10
> frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000000010c1e3fed
> postgres`rxThreadFunc(arg=0x0000000000000000) + 317 at ic_udp.c:6251
> frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> thread #4: tid = 0x2ff41f, 0x00007fff890343f6
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000000010c2c6acb postgres`pg_usleep(microsec=1000000) + 91 at
> pgsleep.c:43
> frame #2: 0x000000010c1799ca
> postgres`generateResourceRefreshHeartBeat(arg=0x00007fa303e03380) + 1482 at
> rmcomm_QD2RM.c:1546
> frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)