Yida Wu created IMPALA-14234:
--------------------------------

             Summary: AdmissionD DCHECK hit during statestore and coordinator 
failover
                 Key: IMPALA-14234
                 URL: https://issues.apache.org/jira/browse/IMPALA-14234
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Yida Wu
            Assignee: Yida Wu


In certain cases, admissionD hits DCHECK during statestore and coordinator 
failover.
Repro steps:
1. Start the minicluster with two coords and one global admissiond with max one 
request allowed.
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py 
--admissiond_args='--default_pool_max_requests=1' --num_coordinators=2 
--enable_admission_service
{code}
2. Run long query 1 in coord 1, being admitted, run short query 2 in coord 2, 
being queued, wait until query 2 timeout.
3. Kill the statestored first, then kill the coord 1.
4. Run short query 3 in coord 2. Start the statestored.
Sometimes it will hit the DCHECK in admissiond like below logs show:
{code:java}
I0716 15:39:22.721899  5746 admission-controller.cc:2665] Could not dequeue 
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 
is at or over limit 1.
I0716 15:39:22.822168  5746 admission-controller.cc:2665] Could not dequeue 
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 
is at or over limit 1.
I0716 15:39:22.922407  5746 admission-controller.cc:2665] Could not dequeue 
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 
is at or over limit 1.
I0716 15:39:23.022684  5746 admission-controller.cc:2665] Could not dequeue 
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 
is at or over limit 1.
I0716 15:39:23.122916 11038 cluster-membership-mgr.cc:280] Local impala server 
needs update
I0716 15:39:23.122927 11038 cluster-membership-mgr.cc:295] Received delta 
membership update
I0716 15:39:23.122938 11038 admission-controller.cc:1960] Detected that 
coordinator c14e143286ccb6aa:be447b1fe338f587 is no longer in the cluster 
membership. Cancelling 1 queries for this coordinator.
I0716 15:39:23.122975 11038 admission-controller.cc:1839] ReleaseQuery for 
49496fa9222bdcb1:00f17a2500000000 called with 1 unreleased backends. Releasing 
automatically.
F0716 15:39:23.123034  5746 admission-controller.cc:2227] Check failed: 
current_membership_version >= previous_membership_version (2 vs. 4)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to