Yida Wu created IMPALA-14605:
--------------------------------
Summary: Memory leak in global admissiond when dequeuing cancelled
queries
Key: IMPALA-14605
URL: https://issues.apache.org/jira/browse/IMPALA-14605
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Yida Wu
Assignee: Yida Wu
We have identified a memory leak scenario in the global admissiond. The issue
occurs when a query waiting in the admission queue is cancelled due to
backpressure failures but is not properly removed from the admission state map
during the dequeue process.
Sequence of Events:
A GetQueryStatus() call from coord fails due to backpressure in admissiond.
{code:java}
I20251203 05:01:47.795506 3938873 status.cc:129]
c0476ba9e0acf5c3:012f334b00000000] GetQueryStatus rpc failed: Remote error:
Service unavailable: GetQueryStatus request on impala.AdmissionControlService
from 127.0.0.6:43351 dropped due to backpressure. The service queue contains 5
items out of a maximum of 2147483647; memory consumption is 68.54 MB.
{code}
Consequently, the coord sends a cancel request for the queued query. The
CancelAdmission function sets the cancel flag in the admission state, code ref:
https://github.com/apache/impala/blob/master/be/src/scheduling/admission-control-service.cc#L282-L289
{code:java}
I20251203 05:11:47.975906 104 admission-control-service.cc:284]
CancelAdmission: query_id=c0476ba9e0acf5c3:012f334b00000000
{code}
The admissiond tries to dequeue the query. It correctly identifies that the
query has been cancelled.
{code:java}
I20251203 05:11:48.116552 117 admission-controller.cc:2650] Dequeued cancelled
query=c0476ba9e0acf5c3:012f334b00000000
{code}
The memory leak is located in this dequeue logic. While the admissiond
recognizes the query is cancelled, it fails to remove the query entry from the
state map before finishing the process.
https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L2655-L2658
--
This message was sent by Atlassian Jira
(v8.20.10#820010)