[ 
https://issues.apache.org/jira/browse/IMPALA-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr resolved IMPALA-14433.
---------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

> Deadlock in OpenTelemetry Tracing Code
> --------------------------------------
>
>                 Key: IMPALA-14433
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14433
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 5.0.0
>            Reporter: Jason Fehr
>            Assignee: Jason Fehr
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>
> All functions in the SpanManager class operate under the assumption that 
> child_span_mu_ in the SpanManager class will be locked before the 
> ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function 
> takes the ClientRequestState lock before calling 
> SpanManager::EndChildSpanPlanning.
> Simplified Explanation:
> 1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock
> 2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks 
> child_span_mu_
> 3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread 
> 1 owns that lock
> 4. Thread 1 --  ImpalaServer::ExecuteInternal calls 
> SpanManager::EndChildSpanPlanning
> 5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread 
> 2 owns that lock
> Detailed Explanation:
> The deadlock happens when another function (such as StartChildSpanClose) is 
> called after ImpalaServer::ExecuteInternal has taken a lock on the 
> ClientRequestState lock but before that same function calls 
> SpanManager::EndChildSpanPlanning.  In this case, the other function takes a 
> lock on child_span_mu_ followed by trying to take the ClientRequestState 
> lock.  Since ImpalaServer::ExecuteInternal already holds that lock, the other 
> function waits.  Then, when ImpalaServer::ExecuteInternal calls 
> SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is 
> already held.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to