Jason Fehr created IMPALA-14433: ----------------------------------- Summary: Deadlock in OpenTelemetry Tracing Code Key: IMPALA-14433 URL: https://issues.apache.org/jira/browse/IMPALA-14433 Project: IMPALA Issue Type: Bug Affects Versions: Impala 5.0.0 Reporter: Jason Fehr Assignee: Jason Fehr
All functions in the SpanManager class operate under the assumption that child_span_mu_ in the SpanManager class will be locked before the ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function takes the ClientRequestState lock before calling SpanManager::EndChildSpanPlanning. Simplified Explanation: 1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock 2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks child_span_mu_ 3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread 1 owns that lock 4. Thread 1 -- ImpalaServer::ExecuteInternal calls SpanManager::EndChildSpanPlanning 5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread 2 owns that lock Detailed Explanation: The deadlock happens when another function (such as StartChildSpanClose) is called after ImpalaServer::ExecuteInternal has taken a lock on the ClientRequestState lock but before that same function calls SpanManager::EndChildSpanPlanning. In this case, the other function takes a lock on child_span_mu_ followed by trying to take the ClientRequestState lock. Since ImpalaServer::ExecuteInternal already holds that lock, the other function waits. Then, when ImpalaServer::ExecuteInternal calls SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is already held. -- This message was sent by Atlassian Jira (v8.20.10#820010)