Yida Wu created IMPALA-14605:
--------------------------------

             Summary: Memory leak in global admissiond when dequeuing cancelled 
queries
                 Key: IMPALA-14605
                 URL: https://issues.apache.org/jira/browse/IMPALA-14605
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 4.5.0
            Reporter: Yida Wu
            Assignee: Yida Wu


We have identified a memory leak scenario in the global admissiond. The issue 
occurs when a query waiting in the admission queue is cancelled due to 
backpressure failures but is not properly removed from the admission state map 
during the dequeue process.

Sequence of Events:
A GetQueryStatus() call from coord fails due to backpressure in admissiond.
{code:java}
I20251203 05:01:47.795506 3938873 status.cc:129] 
c0476ba9e0acf5c3:012f334b00000000] GetQueryStatus rpc failed: Remote error: 
Service unavailable: GetQueryStatus request on impala.AdmissionControlService 
from 127.0.0.6:43351 dropped due to backpressure. The service queue contains 5 
items out of a maximum of 2147483647; memory consumption is 68.54 MB.
{code}
Consequently, the coord sends a cancel request for the queued query. The 
CancelAdmission function sets the cancel flag in the admission state, code ref: 
https://github.com/apache/impala/blob/master/be/src/scheduling/admission-control-service.cc#L282-L289
{code:java}
I20251203 05:11:47.975906  104 admission-control-service.cc:284] 
CancelAdmission: query_id=c0476ba9e0acf5c3:012f334b00000000
{code}
The admissiond tries to dequeue the query. It correctly identifies that the 
query has been cancelled.
{code:java}
I20251203 05:11:48.116552  117 admission-controller.cc:2650] Dequeued cancelled 
query=c0476ba9e0acf5c3:012f334b00000000
{code}
The memory leak is located in this dequeue logic. While the admissiond 
recognizes the query is cancelled, it fails to remove the query entry from the 
state map before finishing the process.
https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L2655-L2658



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to