[
https://issues.apache.org/jira/browse/HAWQ-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Li resolved HAWQ-272.
--------------------------
Resolution: Fixed
Fix Version/s: 2.0.0-beta-incubating
> Segment status will not be down after killing postmaster process of segment
> ----------------------------------------------------------------------------
>
> Key: HAWQ-272
> URL: https://issues.apache.org/jira/browse/HAWQ-272
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Fault Tolerance
> Reporter: Dong Li
> Assignee: Lin Wen
> Fix For: 2.0.0-beta-incubating
>
>
> At the cluster, if it has QE, and you kill the postmaster pocess of
> segment(pid=59335), it can also work and the state of the segment in
> gp_segment_configuration is up.
> {code}
> ps -ef |grep postgres
> 502 59309 1 0 10:07AM ?? 0:05.39
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
> /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432
> --silent-mode=true
> 502 59310 59309 0 10:07AM ?? 0:00.38 postgres: port 5432, master
> logger process
> 502 59313 59309 0 10:07AM ?? 0:00.16 postgres: port 5432, stats
> collector process
> 502 59314 59309 0 10:07AM ?? 0:01.89 postgres: port 5432, writer
> process
> 502 59315 59309 0 10:07AM ?? 0:00.27 postgres: port 5432,
> checkpoint process
> 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432,
> seqserver process
> 502 59317 59309 0 10:07AM ?? 0:00.29 postgres: port 5432, WAL
> Send Server process
> 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS
> Metadata Cache process
> 502 59319 59309 0 10:07AM ?? 0:10.02 postgres: port 5432, master
> resource manager
> 502 59335 1 0 10:07AM ?? 0:12.94
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
> /Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000
> --silent-mode=true
> 502 59336 59335 0 10:07AM ?? 0:00.61 postgres: port 40000, logger
> process
> 502 59403 59309 0 10:07AM ?? 0:02.28 postgres: port 5432, intern
> intern [local] con11 cmd63 idle [local]
> 502 63451 59335 0 10:25AM ?? 0:00.12 postgres: port 40000, stats
> collector process
> 502 63452 59335 0 10:25AM ?? 0:01.43 postgres: port 40000, writer
> process
> 502 63453 59335 0 10:25AM ?? 0:00.20 postgres: port 40000,
> checkpoint process
> 502 63454 59335 0 10:25AM ?? 0:03.64 postgres: port 40000,
> segment resource manager
> 502 63966 59335 0 10:27AM ?? 0:04.88 postgres: port 40000, intern
> intern 127.0.0.1(56871) con11 seg0 idle
> 502 63967 59335 0 10:27AM ?? 0:04.90 postgres: port 40000, intern
> intern 127.0.0.1(56873) con11 seg1 idle
> 502 63968 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern
> intern 127.0.0.1(56875) con11 seg2 idle
> 502 63969 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern
> intern 127.0.0.1(56877) con11 seg3 idle
> 502 63970 59335 0 10:27AM ?? 0:04.89 postgres: port 40000, intern
> intern 127.0.0.1(56879) con11 seg4 idle
> 502 63971 59335 0 10:27AM ?? 0:04.86 postgres: port 40000, intern
> intern 127.0.0.1(56881) con11 seg5 idle
> kill -9 59335
> ps -ef |grep postgres
> 502 59309 1 0 10:07AM ?? 0:05.64
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
> /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432
> --silent-mode=true
> 502 59310 59309 0 10:07AM ?? 0:00.40 postgres: port 5432, master
> logger process
> 502 59313 59309 0 10:07AM ?? 0:00.17 postgres: port 5432, stats
> collector process
> 502 59314 59309 0 10:07AM ?? 0:02.01 postgres: port 5432, writer
> process
> 502 59315 59309 0 10:07AM ?? 0:00.28 postgres: port 5432,
> checkpoint process
> 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432,
> seqserver process
> 502 59317 59309 0 10:07AM ?? 0:00.31 postgres: port 5432, WAL
> Send Server process
> 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS
> Metadata Cache process
> 502 59319 59309 0 10:07AM ?? 0:10.64 postgres: port 5432, master
> resource manager
> 502 59336 1 0 10:07AM ?? 0:00.64 postgres: port 40000, logger
> process
> 502 59403 59309 0 10:07AM ?? 0:02.40 postgres: port 5432, intern
> intern [local] con11 cmd67 idle [local]
> 502 63454 1 0 10:25AM ?? 0:03.96 postgres: port 40000,
> segment resource manager
> 502 63966 1 0 10:27AM ?? 0:04.96 postgres: port 40000, intern
> intern 127.0.0.1(56871) con11 seg0 idle
> 502 63967 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern
> intern 127.0.0.1(56873) con11 seg1 idle
> 502 63968 1 0 10:27AM ?? 0:07.20 postgres: port 40000, intern
> intern 127.0.0.1(56875) con11 seg2 idle
> 502 63969 1 0 10:27AM ?? 0:07.21 postgres: port 40000, intern
> intern 127.0.0.1(56877) con11 seg3 idle
> 502 63970 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern
> intern 127.0.0.1(56879) con11 seg4 idle
> 502 63971 1 0 10:27AM ?? 0:04.94 postgres: port 40000, intern
> intern 127.0.0.1(56881) con11 seg5 idle
> {code}
> Then we execute insert sql.
> {code}
> intern=# select count(*) from b;
> count
> ----------
> 41058000
> (1 row)
> intern=# insert into b VALUES (1);
> INSERT 0 1
> intern=# select count(*) from b;
> count
> ----------
> 41058001
> (1 row)
> intern=# select * from gp_segment_configuration ;
> registration_order | role | status | port | hostname | address
> --------------------+------+--------+-------+------------+------------
> 0 | m | u | 5432 | doli.local | doli.local
> 1 | p | u | 40000 | localhost | 127.0.0.1
> (2 rows)
> {code}
> If your QE is enough to execute the query, it will success. Otherwise it will
> call postmaster to create QE, and it will find postmaster is not alive and
> mark it as down.
> The problem is that we should check the postmaster process of the segment
> live state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)