[
https://issues.apache.org/jira/browse/IMPALA-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-13860.
-----------------------------------
Fix Version/s: Impala 5.0.0
Resolution: Fixed
> DHCECK hit in cluster-membership.mgr.cc
> ---------------------------------------
>
> Key: IMPALA-13860
> URL: https://issues.apache.org/jira/browse/IMPALA-13860
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> A DHCECK is hit in cluster-membership.mgr.cc when enabling impala graceful
> shutdown in test_coord_only_pool_exec_groups. The following is the log from
> crashing impalad during shutdown.
> {noformat}
> I0313 03:34:10.643050 3569793 init.cc:260] Shutdown signal received. Current
> Shutdown Status: shutdown grace period left: 0, deadline left: 1m, cancel
> deadline left: 48s000ms, queries registered on coordinator: 0, queries
> executing: 0, fragment instances: 0
> I0313 03:34:10.685133 3569791 cluster-membership-mgr.cc:247] Processing
> statestore update
> I0313 03:34:10.685143 3569791 cluster-membership-mgr.cc:248] Local backend
> membership needs update
> I0313 03:34:10.685145 3569791 cluster-membership-mgr.cc:264] Received delta
> membership update
> I0313 03:34:10.685168 3569791 cluster-membership-mgr.cc:373] Removing backend
> 054b8dbd87ae1d41:dea01c0c3357a890 from group name:
> "root.group-set-small-group-000"
> min_size: 1
> (quiescing)
> I0313 03:34:10.685184 3569791 cluster-membership-mgr.cc:65] Removing empty
> group name: "root.group-set-small-group-000"
> min_size: 1
> I0313 03:34:10.685212 3569791 cluster-membership-mgr.cc:433] Removing local
> backend from group name: "root.group-set-small-group-000"
> min_size: 2
> F0313 03:34:10.685215 3569791 cluster-membership-mgr.cc:61] Check failed: it
> != executor_groups->end()
> Minidump in thread [3569791]StatestoreSubscriber-2 running query
> 0000000000000000:0000000000000000, fragment instance
> 0000000000000000:0000000000000000
> Wrote minidump to
> /data/jenkins/workspace/impala-private-basic-parameterized/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/354840e1-0cb4-41b9-0fb152aa-d09ceb81.dmp{noformat}
> There are two back-to-back calls to
> RemoveExecutorAndGroup().
> The [first
> call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L375],
> remove backend 054b8dbd87ae1d41:dea01c0c3357a890 and group
> "root.group-set-small-group-000" that turns empty after removal of backend
> 054b8dbd87ae1d41:dea01c0c3357a890 from new_executor_groups.
> The [second
> call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L433],
> remove local backend from new_executor_groups, which then hit the DCHECK for
> not finding the group name "root.group-set-small-group-000".
>
> The DCHECK in RemoveExecutorAndGroup should be replaced by an if and VLOG(1).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)