Yida Wu created IMPALA-14234: -------------------------------- Summary: AdmissionD DCHECK hit during statestore and coordinator failover Key: IMPALA-14234 URL: https://issues.apache.org/jira/browse/IMPALA-14234 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Yida Wu Assignee: Yida Wu
In certain cases, admissionD hits DCHECK during statestore and coordinator failover. Repro steps: 1. Start the minicluster with two coords and one global admissiond with max one request allowed. {code:java} $IMPALA_HOME/bin/start-impala-cluster.py --admissiond_args='--default_pool_max_requests=1' --num_coordinators=2 --enable_admission_service {code} 2. Run long query 1 in coord 1, being admitted, run short query 2 in coord 2, being queued, wait until query 2 timeout. 3. Kill the statestored first, then kill the coord 1. 4. Run short query 3 in coord 2. Start the statestored. Sometimes it will hit the DCHECK in admissiond like below logs show: {code:java} I0716 15:39:22.721899 5746 admission-controller.cc:2665] Could not dequeue query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 is at or over limit 1. I0716 15:39:22.822168 5746 admission-controller.cc:2665] Could not dequeue query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 is at or over limit 1. I0716 15:39:22.922407 5746 admission-controller.cc:2665] Could not dequeue query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 is at or over limit 1. I0716 15:39:23.022684 5746 admission-controller.cc:2665] Could not dequeue query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1 is at or over limit 1. I0716 15:39:23.122916 11038 cluster-membership-mgr.cc:280] Local impala server needs update I0716 15:39:23.122927 11038 cluster-membership-mgr.cc:295] Received delta membership update I0716 15:39:23.122938 11038 admission-controller.cc:1960] Detected that coordinator c14e143286ccb6aa:be447b1fe338f587 is no longer in the cluster membership. Cancelling 1 queries for this coordinator. I0716 15:39:23.122975 11038 admission-controller.cc:1839] ReleaseQuery for 49496fa9222bdcb1:00f17a2500000000 called with 1 unreleased backends. Releasing automatically. F0716 15:39:23.123034 5746 admission-controller.cc:2227] Check failed: current_membership_version >= previous_membership_version (2 vs. 4) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org