[ https://issues.apache.org/jira/browse/IMPALA-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-14271. ------------------------------------ Fix Version/s: Impala 5.0.0 Resolution: Fixed > Tuple caching gets stuck waiting for a runtime filter that will never arrive > ---------------------------------------------------------------------------- > > Key: IMPALA-14271 > URL: https://issues.apache.org/jira/browse/IMPALA-14271 > Project: IMPALA > Issue Type: Task > Components: Backend, Frontend > Affects Versions: Impala 5.0.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Major > Fix For: Impala 5.0.0 > > > In benchmarking with the cost based placement, for a couple queries, some > fragments get stuck waiting for runtime filters. In particular, Q5 for TPC-H > and Q61 for TPC-DS get stuck. They time out of the wait after 5 seconds, > which this is a significant performance regression. > The runtime filters that they are waiting for are remote, so they pass > through the coordinator. The problem is that if the query returns results > quickly, the coordinator will transition from EXECUTING to RETURNED_RESULTS. > The code that is processing the runtime filter on the coordinator in > Coordinator::UpdateFilter() will bail out if the query is no longer executing: > {noformat} > if (!IsExecuting()) { > LOG(INFO) << "Filter update received for non-executing query with id: " > << PrintId(query_id()); > return; > }{noformat} > The fragment that is waiting on the runtime filter won't receive it, so it > waits until it hits the runtime filter wait time or the query is cancelled. > The problem is that this doesn't happen until seconds later. > One solution for this is to reapply the core logic from IMPALA-6984 (i.e. > reverting IMPALA-10047). That immediately sends a cancel when the query > transitions to RETURNED_RESULTS. > This should only be a problem if all fragment instances hit the tuple cache. > If one fragment instance does not, then the query won't transition to > RETURNED_RESULTS before the runtime filter is processed, because the fragment > instance still needs the hash table. -- This message was sent by Atlassian Jira (v8.20.10#820010)