Yida Wu created IMPALA-14771:
--------------------------------

             Summary: Admissiond crash in DequeueLoop caused by dangling 
ScheduleState
                 Key: IMPALA-14771
                 URL: https://issues.apache.org/jira/browse/IMPALA-14771
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Yida Wu
            Assignee: Yida Wu


IMPALA-14661 adds support for compressing admission requests, but the test 
TestAdmissionControllerWithACService was not correctly applying the start flag 
matrix, so compression was not actually enabled during testing.

After fixing the test, we found that admissiond can hit a 
[DCHECK|https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L215C3-L215C50]
 in DequeueLoop due to a dangling ScheduleState after ClearDecompressedCache().
{code:java}
#3  0x000000000187b3fe in impala::ScheduleState::GetPerExecutorMemoryEstimate 
(this=this@entry=0xc03f800) at 
/impala/Impala/be/src/scheduling/schedule-state.cc:215
#4  0x000000000187bce5 in impala::ScheduleState::UpdateMemoryRequirements 
(this=this@entry=0xc03f800, pool_cfg=..., 
coord_mem_limit_admission=12884901888, executor_mem_limit_admission=12884901888)
    at /impala/Impala/be/src/scheduling/schedule-state.cc:329
#5  0x0000000001812328 in impala::AdmissionController::FindGroupToAdmitOrReject 
(this=this@entry=0x951fc00, membership_snapshot=..., pool_config=..., 
root_cfg=..., admit_from_queue=admit_from_queue@entry=true, 
pool_stats=pool_stats@entry=0xc5c1bb0, 
    queue_node=0xc622730, coordinator_resource_limited=@0x7f3991e415fe: false, 
is_trivial=0x7f3991e415ff) at 
/impala/Impala/be/src/scheduling/admission-controller.cc:2501
#6  0x0000000001812cf4 in impala::AdmissionController::TryDequeue 
(this=this@entry=0x951fc00) at 
/impala/Impala/be/src/scheduling/admission-controller.cc:2686
#7  0x000000000181497a in impala::AdmissionController::DequeueLoop 
(this=0x951fc00) at 
/impala/Impala/be/src/scheduling/admission-controller.cc:2646
#8  0x0000000001816f83 in boost::_mfi::mf0<void, 
impala::AdmissionController>::operator() (p=<optimized out>, this=<optimized 
out>) at 
/impala/Impala/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/bind/mem_fn_template.hpp:49
{code}
The reason is that ScheduleState [depends on the decompressed 
TQueryExecRequest|https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L2424].
 When a query is enqueued, [ClearDecompressedCache() is 
called|https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L1793]
 to save memory, which frees the decompressed exec request. However, 
queue_node->group_states still holds ScheduleState objects that reference this 
freed request.
When dequeuing, these stale objects are reused and cause a DCHECK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to