[
https://issues.apache.org/jira/browse/IMPALA-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022035#comment-18022035
]
ASF subversion and git services commented on IMPALA-10047:
----------------------------------------------------------
Commit 16c350ce5aa5dbf1a9ea8b481c64780d445e9a2a in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=16c350ce5 ]
IMPALA-14271: Reapply the core piece of IMPALA-6984
IMPALA-6984 changed the behavior to cancel backends when the
query reaches the RETURNED_RESULTS state. This ran into a regression
on large clusters where a query would end up waiting 10 seconds.
IMPALA-10047 reverted the core piece of the change.
For tuple caching, we found that a scan node can get stuck waiting
for a global runtime filter. It turns out that the coordinator will
not send out global runtime filters if the query is in a terminal
state. Tuple caching was causing queries to reach the RETURNED_RESULTS
phase before the runtime filter could be sent out. Reenabling the core
part of IMPALA-6984 sends out a cancel as soon as the query transitions
to RETURNED_RESULTS and wakes up any fragment instances waiting on
runtime filters.
The underlying cause of IMPALA-10047 is a tangle of locks that causes
us to exhaust the RPC threads. The coordinator is holding a lock on the
backend state while it sends the cancel synchronously. Other backends
that complete during that time run
Coordinator::BackendState::LogFirstInProgress(),
which iterates through backend states to find the first that is not done.
The check to see if a backend state is done takes a lock on the backend
state. The problem case is that the coordinator may be sending a cancel
to a backend on itself. In that case, it needs an RPC thread on the coordinator
to be available to process the cancel. If all of the RPC threads are
processing updates, they can all call LogFirstInProgress() and get stuck
on the backend state lock for the coordinator's fragment. In that case,
it becomes a temporary deadlock as the cancel can't be processed and the
coordinator won't release the lock. It only gets resolved by the RPC timing
out.
To resolve this, this changes the Cancel() method to drop the lock while
doing the CancelQueryFInstances RPC. It reacquires the lock when it finishes
the RPC.
Testing:
- Hand tested with 10 impalads and control_service_num_svc_threads=1
Without the fix, it reproduces easily after reverting IMPALA-10047.
With the fix, it doesn't reproduce.
Change-Id: Ia058b03c72cc4bb83b0bd0a19ff6c8c43a647974
Reviewed-on: http://gerrit.cloudera.org:8080/23264
Reviewed-by: Yida Wu <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Performance regression on short queries due to IMPALA-6984 fix
> --------------------------------------------------------------
>
> Key: IMPALA-10047
> URL: https://issues.apache.org/jira/browse/IMPALA-10047
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
> Fix For: Impala 4.0.0
>
>
> When doing some TPC-DS benchmarking with mt_dop, we encountered intermittent
> performance regressions on short queries. Some query executions seem to be
> taking an extra 10 seconds in exec status reports due to delays in sending a
> cancel RPC. From the coordinator logs:
>
> {noformat}
> W0804 02:52:33.922088 108 rpcz_store.cc:253] Call
> impala.ControlService.CancelQueryFInstances from 127.0.0.1:46738 (request
> call id 3134) took 10007 ms (10 s). Client timeout 10000 ms (10 s)
> W0804 02:52:33.922143 108 rpcz_store.cc:259] Trace:
> 0804 02:52:23.914291 (+ 0us) impala-service-pool.cc:170] Inserting onto
> call queue
> 0804 02:52:33.922079 (+10007788us) impala-service-pool.cc:255] Skipping call
> since client already timed out
> 0804 02:52:33.922086 (+ 7us) inbound_call.cc:162] Queueing failure
> response
> Metrics: {}
> I0804 02:52:33.922214 101 connection.cc:730] Got response to call id 3134
> after client already timed out or cancelled
> I0804 02:52:33.923286 20276 coordinator-backend-state.cc:889]
> query_id=f442e73a0d35c136:c9993d7700000000 target backend=xx.xx.xx.xx:27000:
> Sending CancelQueryFInstances rpc{noformat}
> The rpcz page also shows that some ReportExecStatus RPCs are taking 10
> seconds:
>
>
> {noformat}
> "incoming_queue_time": "Count: 671901, min / max: 1000.000ns / 10s347ms, 25th
> %-ile: 12.000us, 50th %-ile: 18.000us, 75th %-ile: 28.000us, 90th %-ile:
> 67.000us, 95th %-ile: 456.000us, 99.9th %-ile: 10s133ms",
> {
> "method_name": "ReportExecStatus",
> "handler_latency": "Count: 169653, min / max: 38.000us /
> 10s173ms, 25th %-ile: 9.024ms, 50th %-ile: 20.352ms, 75th %-ile: 35.840ms,
> 90th %-ile: 94.720ms, 95th %-ile: 177.152ms, 99.9th %-ile: 10s027ms",
> "payload_size": "Count: 169653, min / max: 5.81 KB / 3.81 MB,
> 25th %-ile: 425.00 KB, 50th %-ile: 760.00 KB, 75th %-ile: 1.47 MB, 90th
> %-ile: 1.96 MB, 95th %-ile: 2.31 MB, 99.9th %-ile: 3.73 MB"
> }]{noformat}
>
> IMPALA-6984 introduced a Coordinator::CancelBackends() call to
> Coordinator::HandleExecStateTransition() for the ExecState::RETURNED_RESULTS
> case:
> {noformat}
> if (new_state == ExecState::RETURNED_RESULTS) {
> // Cancel all backends, but wait for the final status reports to be
> received so that
> // we have a complete profile for this successful query.
> CancelBackends(/*fire_and_forget=*/ false);
> WaitForBackends();
> } else {
> CancelBackends(/*fire_and_forget=*/ true);
> }{noformat}
> Removing this call eliminates the performance regression, so it will need
> more investigation.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]