[
https://issues.apache.org/jira/browse/IMPALA-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875346#comment-17875346
]
ASF subversion and git services commented on IMPALA-13313:
----------------------------------------------------------
Commit 4b500a55cbfcdd311a1c766e33849f7ae05a1a8e in impala's branch
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4b500a55c ]
IMPALA-13313: Fix ExpireQueries deadlock
IMPALA-12602 introduced registering idle queries with a session so that
we can expire queries while still making their status available, and
clean up the idle query status when sessions are closed. That happens in
ImpalaServer::ExpireQueries, where it needs to acquire the
query_expiration_lock_ then a session_state->lock.
However that violated the lock order documented in impala-server.h, and
led to a deadlock when a query is expired at the same time another query
is registering expiration timers (which follows the documented order).
When the deadlock occurs, SetQueryInFlight holds a session_state->lock
and tries to acquire query_expiration_lock_, while ExpireQueries holds
the query_expiration_lock_ and tries to acquire session_state->lock.
The prior order between query_expiration_lock_ and session_state->lock
was largely arbitrary. query_expiration_lock_ operations don't
inherently require holding the session_state->lock. However expiration
operations work on a queue of ClientRequestStates that map to different
session states, so when we need to operate on a session state as part of
expiration we pretty much have to take query_expiration_lock_ first.
Updates lock order to take query_expiration_lock_ before
session_state->lock, and modifies SetQueryInFlight to release the
session_state->lock before registering expiration timers. The expiration
timers aren't related to the session, and query lifetime is maintained
by the QueryHandle reference.
Adds a custom cluster test that uses debug actions to reproduce the
deadlock scenario.
Change-Id: I6fce4103f6eeb7e9a4320ba1da817cab81071ba3
Reviewed-on: http://gerrit.cloudera.org:8080/21699
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Michael Smith <[email protected]>
> Potential deadlock in ImpalaServer::ExpireQueries()
> ---------------------------------------------------
>
> Key: IMPALA-13313
> URL: https://issues.apache.org/jira/browse/IMPALA-13313
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.4.0
> Reporter: Yida Wu
> Assignee: Michael Smith
> Priority: Critical
> Fix For: Impala 4.5.0
>
>
> IMPALA-12602 introduces a way to unregister a query from a session when
> idle_query_timeout is reached. However, it also includes logic in
> ExpireQueries() that could cause a deadlock by trying to get the
> SessionState::lock while also holding query_expiration_lock_. This violates
> the lock order defined in
> [impala-server.h|https://github.com/apache/impala/blob/9848cd84be6ed07fe542b82d2e2628e658690621/be/src/service/impala-server.h#L187]
> and could potentially result in a deadlock.
> For example, it can have a deadlock with
> [SetInFlight()|https://github.infra.cloudera.com/CDH/Impala/blob/1e4e196b53c2ba88c58d13dfb3709b849767a109/be/src/service/impala-server.cc#L1386],
> which may try to get the query_expiration_lock_ while holding
> SessionState::lock.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]