[ 
https://issues.apache.org/jira/browse/IMPALA-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13860.
-----------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

> DHCECK hit in cluster-membership.mgr.cc
> ---------------------------------------
>
>                 Key: IMPALA-13860
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13860
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> A DHCECK is hit in cluster-membership.mgr.cc when enabling impala graceful 
> shutdown in test_coord_only_pool_exec_groups. The following is the log from 
> crashing impalad during shutdown.
> {noformat}
> I0313 03:34:10.643050 3569793 init.cc:260] Shutdown signal received. Current 
> Shutdown Status: shutdown grace period left: 0, deadline left: 1m, cancel 
> deadline left: 48s000ms, queries registered on coordinator: 0, queries 
> executing: 0, fragment instances: 0
> I0313 03:34:10.685133 3569791 cluster-membership-mgr.cc:247] Processing 
> statestore update
> I0313 03:34:10.685143 3569791 cluster-membership-mgr.cc:248] Local backend 
> membership needs update
> I0313 03:34:10.685145 3569791 cluster-membership-mgr.cc:264] Received delta 
> membership update
> I0313 03:34:10.685168 3569791 cluster-membership-mgr.cc:373] Removing backend 
> 054b8dbd87ae1d41:dea01c0c3357a890 from group name: 
> "root.group-set-small-group-000"
> min_size: 1
>  (quiescing)
> I0313 03:34:10.685184 3569791 cluster-membership-mgr.cc:65] Removing empty 
> group name: "root.group-set-small-group-000"
> min_size: 1
> I0313 03:34:10.685212 3569791 cluster-membership-mgr.cc:433] Removing local 
> backend from group name: "root.group-set-small-group-000"
> min_size: 2
> F0313 03:34:10.685215 3569791 cluster-membership-mgr.cc:61] Check failed: it 
> != executor_groups->end()
> Minidump in thread [3569791]StatestoreSubscriber-2 running query 
> 0000000000000000:0000000000000000, fragment instance 
> 0000000000000000:0000000000000000
> Wrote minidump to 
> /data/jenkins/workspace/impala-private-basic-parameterized/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/354840e1-0cb4-41b9-0fb152aa-d09ceb81.dmp{noformat}
> There are two back-to-back calls to 
> RemoveExecutorAndGroup().
> The [first 
> call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L375],
>  remove backend 054b8dbd87ae1d41:dea01c0c3357a890 and group 
> "root.group-set-small-group-000" that turns empty after removal of backend 
> 054b8dbd87ae1d41:dea01c0c3357a890 from new_executor_groups.
> The [second 
> call|https://github.com/apache/impala/blob/8093c3fa6b44f7f6ec699d2dd47581401f75f363/be/src/scheduling/cluster-membership-mgr.cc#L433],
>  remove local backend from new_executor_groups, which then hit the DCHECK for 
> not finding the group name "root.group-set-small-group-000".
>  
> The DCHECK in RemoveExecutorAndGroup should be replaced by an if and VLOG(1).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to