Joe McDonnell created IMPALA-10047:
--------------------------------------
Summary: Performance regression on short queries due to
IMPALA-6984 fix
Key: IMPALA-10047
URL: https://issues.apache.org/jira/browse/IMPALA-10047
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell
When doing some TPC-DS benchmarking with mt_dop, we encountered intermittent
performance regressions on short queries. Some query executions seem to be
taking an extra 10 seconds in exec status reports due to delays in sending a
cancel RPC. From the coordinator logs:
{noformat}
W0804 02:52:33.922088 108 rpcz_store.cc:253] Call
impala.ControlService.CancelQueryFInstances from 127.0.0.1:46738 (request call
id 3134) took 10007 ms (10 s). Client timeout 10000 ms (10 s)
W0804 02:52:33.922143 108 rpcz_store.cc:259] Trace:
0804 02:52:23.914291 (+ 0us) impala-service-pool.cc:170] Inserting onto
call queue
0804 02:52:33.922079 (+10007788us) impala-service-pool.cc:255] Skipping call
since client already timed out
0804 02:52:33.922086 (+ 7us) inbound_call.cc:162] Queueing failure response
Metrics: {}
I0804 02:52:33.922214 101 connection.cc:730] Got response to call id 3134
after client already timed out or cancelled
I0804 02:52:33.923286 20276 coordinator-backend-state.cc:889]
query_id=f442e73a0d35c136:c9993d7700000000 target backend=xx.xx.xx.xx:27000:
Sending CancelQueryFInstances rpc{noformat}
The rpcz page also shows that some ReportExecStatus RPCs are taking 10 seconds:
{noformat}
"incoming_queue_time": "Count: 671901, min / max: 1000.000ns / 10s347ms, 25th
%-ile: 12.000us, 50th %-ile: 18.000us, 75th %-ile: 28.000us, 90th %-ile:
67.000us, 95th %-ile: 456.000us, 99.9th %-ile: 10s133ms",
{
"method_name": "ReportExecStatus",
"handler_latency": "Count: 169653, min / max: 38.000us /
10s173ms, 25th %-ile: 9.024ms, 50th %-ile: 20.352ms, 75th %-ile: 35.840ms, 90th
%-ile: 94.720ms, 95th %-ile: 177.152ms, 99.9th %-ile: 10s027ms",
"payload_size": "Count: 169653, min / max: 5.81 KB / 3.81 MB,
25th %-ile: 425.00 KB, 50th %-ile: 760.00 KB, 75th %-ile: 1.47 MB, 90th %-ile:
1.96 MB, 95th %-ile: 2.31 MB, 99.9th %-ile: 3.73 MB"
}]{noformat}
IMPALA-6984 introduced a Coordinator::CancelBackends() call to
Coordinator::HandleExecStateTransition() for the ExecState::RETURNED_RESULTS
case:
{noformat}
if (new_state == ExecState::RETURNED_RESULTS) {
// Cancel all backends, but wait for the final status reports to be
received so that
// we have a complete profile for this successful query.
CancelBackends(/*fire_and_forget=*/ false);
WaitForBackends();
} else {
CancelBackends(/*fire_and_forget=*/ true);
}{noformat}
Removing this call eliminates the performance regression, so it will need more
investigation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]