Riza Suminto created IMPALA-13860:
-------------------------------------
Summary: DHCECK hit in cluster-membership.mgr.cc
Key: IMPALA-13860
URL: https://issues.apache.org/jira/browse/IMPALA-13860
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Riza Suminto
A DHCECK is hit in cluster-membership.mgr.cc when enabling impala graceful
shutdown in test_coord_only_pool_exec_groups. The following is the log from
crashing impalad during shutdown.
{noformat}
I0313 03:34:10.643050 3569793 init.cc:260] Shutdown signal received. Current
Shutdown Status: shutdown grace period left: 0, deadline left: 1m, cancel
deadline left: 48s000ms, queries registered on coordinator: 0, queries
executing: 0, fragment instances: 0
I0313 03:34:10.685133 3569791 cluster-membership-mgr.cc:247] Processing
statestore update
I0313 03:34:10.685143 3569791 cluster-membership-mgr.cc:248] Local backend
membership needs update
I0313 03:34:10.685145 3569791 cluster-membership-mgr.cc:264] Received delta
membership update
I0313 03:34:10.685168 3569791 cluster-membership-mgr.cc:373] Removing backend
054b8dbd87ae1d41:dea01c0c3357a890 from group name:
"root.group-set-small-group-000"
min_size: 1
(quiescing)
I0313 03:34:10.685184 3569791 cluster-membership-mgr.cc:65] Removing empty
group name: "root.group-set-small-group-000"
min_size: 1
I0313 03:34:10.685212 3569791 cluster-membership-mgr.cc:433] Removing local
backend from group name: "root.group-set-small-group-000"
min_size: 2
F0313 03:34:10.685215 3569791 cluster-membership-mgr.cc:61] Check failed: it !=
executor_groups->end()
Minidump in thread [3569791]StatestoreSubscriber-2 running query
0000000000000000:0000000000000000, fragment instance
0000000000000000:0000000000000000
Wrote minidump to
/data/jenkins/workspace/impala-private-basic-parameterized/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/354840e1-0cb4-41b9-0fb152aa-d09ceb81.dmp{noformat}
There are two back-to-back calls to
RemoveExecutorAndGroup().
The [first
call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L375],
remove backend 054b8dbd87ae1d41:dea01c0c3357a890 and group
"root.group-set-small-group-000" that turns empty after removal of backend
054b8dbd87ae1d41:dea01c0c3357a890 from
new_executor_groups.
The [second
call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L433],
remove local backend from
new_executor_groups, which then hit the DCHECK for not finding the group name
"root.group-set-small-group-000".
The DCHECK in RemoveExecutorAndGroup should be replaced by an if and VLOG(1).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)