Jason Fehr created IMPALA-14433:
-----------------------------------

             Summary: Deadlock in OpenTelemetry Tracing Code
                 Key: IMPALA-14433
                 URL: https://issues.apache.org/jira/browse/IMPALA-14433
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 5.0.0
            Reporter: Jason Fehr
            Assignee: Jason Fehr


All functions in the SpanManager class operate under the assumption that 
child_span_mu_ in the SpanManager class will be locked before the 
ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function 
takes the ClientRequestState lock before calling 
SpanManager::EndChildSpanPlanning.

Simplified Explanation:
1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock
2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks 
child_span_mu_
3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread 1 
owns that lock
4. Thread 1 --  ImpalaServer::ExecuteInternal calls 
SpanManager::EndChildSpanPlanning
5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread 2 
owns that lock

Detailed Explanation:
The deadlock happens when another function (such as StartChildSpanClose) is 
called after ImpalaServer::ExecuteInternal has taken a lock on the 
ClientRequestState lock but before that same function calls 
SpanManager::EndChildSpanPlanning.  In this case, the other function takes a 
lock on child_span_mu_ followed by trying to take the ClientRequestState lock.  
Since ImpalaServer::ExecuteInternal already holds that lock, the other function 
waits.  Then, when ImpalaServer::ExecuteInternal calls 
SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is 
already held.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to