Dong Li created HAWQ-272:
----------------------------
Summary: Segment status will not be down after killing postmaster
process of segment
Key: HAWQ-272
URL: https://issues.apache.org/jira/browse/HAWQ-272
Project: Apache HAWQ
Issue Type: Bug
Components: Fault Tolerance
Reporter: Dong Li
Assignee: Lei Chang
At the cluster, if it has QE, and you kill the postmaster pocess of
segment(pid=59335), it can also work and the state of the segment in
gp_segment_configuration is up.
{code}
ps -ef |grep postgres
502 59309 1 0 10:07AM ?? 0:05.39
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
/Users/intern/hawq-data-directory/masterdd -i -M master -p 5432
--silent-mode=true
502 59310 59309 0 10:07AM ?? 0:00.38 postgres: port 5432, master
logger process
502 59313 59309 0 10:07AM ?? 0:00.16 postgres: port 5432, stats
collector process
502 59314 59309 0 10:07AM ?? 0:01.89 postgres: port 5432, writer
process
502 59315 59309 0 10:07AM ?? 0:00.27 postgres: port 5432,
checkpoint process
502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432,
seqserver process
502 59317 59309 0 10:07AM ?? 0:00.29 postgres: port 5432, WAL Send
Server process
502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS
Metadata Cache process
502 59319 59309 0 10:07AM ?? 0:10.02 postgres: port 5432, master
resource manager
502 59335 1 0 10:07AM ?? 0:12.94
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
/Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000
--silent-mode=true
502 59336 59335 0 10:07AM ?? 0:00.61 postgres: port 40000, logger
process
502 59403 59309 0 10:07AM ?? 0:02.28 postgres: port 5432, intern
intern [local] con11 cmd63 idle [local]
502 63451 59335 0 10:25AM ?? 0:00.12 postgres: port 40000, stats
collector process
502 63452 59335 0 10:25AM ?? 0:01.43 postgres: port 40000, writer
process
502 63453 59335 0 10:25AM ?? 0:00.20 postgres: port 40000,
checkpoint process
502 63454 59335 0 10:25AM ?? 0:03.64 postgres: port 40000, segment
resource manager
502 63966 59335 0 10:27AM ?? 0:04.88 postgres: port 40000, intern
intern 127.0.0.1(56871) con11 seg0 idle
502 63967 59335 0 10:27AM ?? 0:04.90 postgres: port 40000, intern
intern 127.0.0.1(56873) con11 seg1 idle
502 63968 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern
intern 127.0.0.1(56875) con11 seg2 idle
502 63969 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern
intern 127.0.0.1(56877) con11 seg3 idle
502 63970 59335 0 10:27AM ?? 0:04.89 postgres: port 40000, intern
intern 127.0.0.1(56879) con11 seg4 idle
502 63971 59335 0 10:27AM ?? 0:04.86 postgres: port 40000, intern
intern 127.0.0.1(56881) con11 seg5 idle
kill -9 59335
ps -ef |grep postgres
502 59309 1 0 10:07AM ?? 0:05.64
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D
/Users/intern/hawq-data-directory/masterdd -i -M master -p 5432
--silent-mode=true
502 59310 59309 0 10:07AM ?? 0:00.40 postgres: port 5432, master
logger process
502 59313 59309 0 10:07AM ?? 0:00.17 postgres: port 5432, stats
collector process
502 59314 59309 0 10:07AM ?? 0:02.01 postgres: port 5432, writer
process
502 59315 59309 0 10:07AM ?? 0:00.28 postgres: port 5432,
checkpoint process
502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432,
seqserver process
502 59317 59309 0 10:07AM ?? 0:00.31 postgres: port 5432, WAL Send
Server process
502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS
Metadata Cache process
502 59319 59309 0 10:07AM ?? 0:10.64 postgres: port 5432, master
resource manager
502 59336 1 0 10:07AM ?? 0:00.64 postgres: port 40000, logger
process
502 59403 59309 0 10:07AM ?? 0:02.40 postgres: port 5432, intern
intern [local] con11 cmd67 idle [local]
502 63454 1 0 10:25AM ?? 0:03.96 postgres: port 40000, segment
resource manager
502 63966 1 0 10:27AM ?? 0:04.96 postgres: port 40000, intern
intern 127.0.0.1(56871) con11 seg0 idle
502 63967 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern
intern 127.0.0.1(56873) con11 seg1 idle
502 63968 1 0 10:27AM ?? 0:07.20 postgres: port 40000, intern
intern 127.0.0.1(56875) con11 seg2 idle
502 63969 1 0 10:27AM ?? 0:07.21 postgres: port 40000, intern
intern 127.0.0.1(56877) con11 seg3 idle
502 63970 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern
intern 127.0.0.1(56879) con11 seg4 idle
502 63971 1 0 10:27AM ?? 0:04.94 postgres: port 40000, intern
intern 127.0.0.1(56881) con11 seg5 idle
{code}
Then we execute insert sql.
{code}
intern=# select count(*) from b;
count
----------
41058000
(1 row)
intern=# insert into b VALUES (1);
INSERT 0 1
intern=# select count(*) from b;
count
----------
41058001
(1 row)
intern=# select * from gp_segment_configuration ;
registration_order | role | status | port | hostname | address
--------------------+------+--------+-------+------------+------------
0 | m | u | 5432 | doli.local | doli.local
1 | p | u | 40000 | localhost | 127.0.0.1
(2 rows)
{code}
If your QE is enough to execute the query, it will success. Otherwise it will
call postmaster to create QE, and it will find postmaster is not alive and mark
it as down.
The problem is that we should check the postmaster process of the segment live
state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)