Riza Suminto created IMPALA-13860:
-------------------------------------

             Summary: DHCECK hit in cluster-membership.mgr.cc
                 Key: IMPALA-13860
                 URL: https://issues.apache.org/jira/browse/IMPALA-13860
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Riza Suminto


A DHCECK is hit in cluster-membership.mgr.cc when enabling impala graceful 
shutdown in test_coord_only_pool_exec_groups. The following is the log from 
crashing impalad during shutdown.
{noformat}
I0313 03:34:10.643050 3569793 init.cc:260] Shutdown signal received. Current 
Shutdown Status: shutdown grace period left: 0, deadline left: 1m, cancel 
deadline left: 48s000ms, queries registered on coordinator: 0, queries 
executing: 0, fragment instances: 0
I0313 03:34:10.685133 3569791 cluster-membership-mgr.cc:247] Processing 
statestore update
I0313 03:34:10.685143 3569791 cluster-membership-mgr.cc:248] Local backend 
membership needs update
I0313 03:34:10.685145 3569791 cluster-membership-mgr.cc:264] Received delta 
membership update
I0313 03:34:10.685168 3569791 cluster-membership-mgr.cc:373] Removing backend 
054b8dbd87ae1d41:dea01c0c3357a890 from group name: 
"root.group-set-small-group-000"
min_size: 1
 (quiescing)
I0313 03:34:10.685184 3569791 cluster-membership-mgr.cc:65] Removing empty 
group name: "root.group-set-small-group-000"
min_size: 1
I0313 03:34:10.685212 3569791 cluster-membership-mgr.cc:433] Removing local 
backend from group name: "root.group-set-small-group-000"
min_size: 2
F0313 03:34:10.685215 3569791 cluster-membership-mgr.cc:61] Check failed: it != 
executor_groups->end()
Minidump in thread [3569791]StatestoreSubscriber-2 running query 
0000000000000000:0000000000000000, fragment instance 
0000000000000000:0000000000000000
Wrote minidump to 
/data/jenkins/workspace/impala-private-basic-parameterized/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/354840e1-0cb4-41b9-0fb152aa-d09ceb81.dmp{noformat}
There are two back-to-back calls to 
RemoveExecutorAndGroup().
The [first 
call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L375],
 remove backend 054b8dbd87ae1d41:dea01c0c3357a890 and group 
"root.group-set-small-group-000" that turns empty after removal of backend 
054b8dbd87ae1d41:dea01c0c3357a890 from 
new_executor_groups.
The [second 
call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L433],
 remove local backend from 
new_executor_groups, which then hit the DCHECK for not finding the group name 
"root.group-set-small-group-000".
 
The DCHECK in RemoveExecutorAndGroup should be replaced by an if and VLOG(1).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to