[ 
https://issues.apache.org/jira/browse/IMPALA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1575.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

IMPALA-1575: part 2: yield admission control resources

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

I ran into a problem similar to IMPALA-3772 with the admission control
tests becoming flaky due to query timeouts on release builds, which I
solved in a similar way by increasing the frequency of statestore
updates.

This is based on an earlier patch by Joe McDonnell.

Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a
Reviewed-on: http://gerrit.cloudera.org:8080/8581
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins


> Cancelled queries do not yield resources until close
> ----------------------------------------------------
>
>                 Key: IMPALA-1575
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1575
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.1, Impala 2.3.0
>            Reporter: Henry Robinson
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: query-lifecycle, resource-management
>             Fix For: Impala 2.11.0
>
>
> A cancelled query (for example due to a timeout) or a query that has reached 
> eos (but not explicitly closed) holds (1) resources on the coordinator 
> fragment, (2) all resources accounted by the admission controller, (3) llama 
> reservations. (However, Llama has been unsupported for CDH 5.5 and beyond, so 
> (3) will no longer apply.) All of these are not released until the query is 
> closed, which may not happen promptly for some clients.
> This frequently occurs with Hue. Hue (and some other clients that behave 
> similarly) will not close a query until explicitly closed (in the Hue case 
> this is via a javascript callback sent by the browser when closing the Hue 
> tab). If the query is left unattended (or the Hue tab is on a laptop that is 
> closed, or the browser crashes), the close call is never sent, and while the 
> query will "time out", the cancellation doesn't properly clean up resources.
> One way to mitigate this issue in this case is by using the 
> --idle_session_timeout impalad argument to fully close a session and all 
> associated queries after some amount of time (but this is not a workaround 
> that works in all cases).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to