[
https://issues.apache.org/jira/browse/IMPALA-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell updated IMPALA-14271:
-----------------------------------
Description:
In benchmarking with the cost based placement, for a couple queries, some
fragments get stuck waiting for runtime filters. In particular, Q5 for TPC-H
and Q61 for TPC-DS get stuck. They time out of the wait after 5 seconds, which
this is a significant performance regression.
The runtime filters that they are waiting for are remote, so they pass through
the coordinator. The problem is that if the query returns results quickly, the
coordinator will transition from EXECUTING to RETURNED_RESULTS. The code that
is processing the runtime filter on the coordinator in
Coordinator::UpdateFilter() will bail out if the query is no longer executing:
{noformat}
if (!IsExecuting()) {
LOG(INFO) << "Filter update received for non-executing query with id: "
<< PrintId(query_id());
return;
}{noformat}
The fragment that is waiting on the runtime filter won't receive it, so it
waits until it hits the runtime filter wait time or the query is cancelled. The
problem is that this doesn't happen until seconds later.
One solution for this is to reapply the core logic from IMPALA-6984 (i.e.
reverting IMPALA-10047). That immediately sends a cancel when the query
transitions to RETURNED_RESULTS.
This should only be a problem if all fragment instances hit the tuple cache. If
one fragment instance does not, then the query won't transition to
RETURNED_RESULTS before the runtime filter is processed, because the fragment
instance still needs the hash table.
was:
In benchmarking with the cost based placement, for a couple queries, some
fragments get stuck waiting for runtime filters. In particular, Q5 for TPC-H
and Q61 for TPC-DS get stuck. They time out of the wait after 5 seconds, which
this is a significant performance regression.
Consider a plan like this where we are caching above a string of hash joins.
With mt_dop, each hash join has a separate fragment on the build side:
Fragment 1:
Cache location
Hash join 3 <--- broadcast: build side fragment 2
Hash join 2 <--- broadcast: build side fragment 3
Hash join 1 <--- broadcast: build side fragment 4
Probe scan node
An example problematic runtime filter goes from hash join 3 to hash join 2's
build side fragment 3. With a cache hit above everything, the runtime filter
never gets generated, so build side fragment 3 is waiting for a filter that
never comes.
We need some approach to handle this to avoid the performance issue.
> Tuple caching needs to handle runtime filters destined for other fragments
> --------------------------------------------------------------------------
>
> Key: IMPALA-14271
> URL: https://issues.apache.org/jira/browse/IMPALA-14271
> Project: IMPALA
> Issue Type: Task
> Components: Backend, Frontend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Priority: Major
>
> In benchmarking with the cost based placement, for a couple queries, some
> fragments get stuck waiting for runtime filters. In particular, Q5 for TPC-H
> and Q61 for TPC-DS get stuck. They time out of the wait after 5 seconds,
> which this is a significant performance regression.
> The runtime filters that they are waiting for are remote, so they pass
> through the coordinator. The problem is that if the query returns results
> quickly, the coordinator will transition from EXECUTING to RETURNED_RESULTS.
> The code that is processing the runtime filter on the coordinator in
> Coordinator::UpdateFilter() will bail out if the query is no longer executing:
> {noformat}
> if (!IsExecuting()) {
> LOG(INFO) << "Filter update received for non-executing query with id: "
> << PrintId(query_id());
> return;
> }{noformat}
> The fragment that is waiting on the runtime filter won't receive it, so it
> waits until it hits the runtime filter wait time or the query is cancelled.
> The problem is that this doesn't happen until seconds later.
> One solution for this is to reapply the core logic from IMPALA-6984 (i.e.
> reverting IMPALA-10047). That immediately sends a cancel when the query
> transitions to RETURNED_RESULTS.
> This should only be a problem if all fragment instances hit the tuple cache.
> If one fragment instance does not, then the query won't transition to
> RETURNED_RESULTS before the runtime filter is processed, because the fragment
> instance still needs the hash table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]