[jira] [Resolved] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE

Tim Armstrong (Jira) Mon, 21 Dec 2020 10:49:07 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Armstrong resolved IMPALA-3160.
-----------------------------------
    Resolution: Won't Fix

Seems like this is largely an academic issue at this point so no point keeping 
it open.

> Queries may not get cancelled if cancellation pool hits 
> MAX_CANCELLATION_QUEUE_SIZE
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-3160
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3160
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.5.0
>            Reporter: Sailesh Mukil
>            Assignee: Thomas Tauber-Marshall
>            Priority: Minor
>
> The ImpalaServer::MembershipCallback() function determines if a backend(s) is 
> down from the topic updates from the statestore. It also cancels all the 
> queries that are already in flight on these failed backends after comparing 
> the failed backend from the topic update to the failed backend in the 
> query_locations_ map which maps backends to queries running on it.
> If the cancellation queue is too large (tracked by 
> MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the 
> next heartbeat, the cancellation queue frees up so we can re-try the 
> cancellation of these queries.
> However, by that point we already remove the failed backend from the 
> query_locations_ map. So, the next heartbeat will never find this backend to 
> cancel the queries running on it.
> {code:java}
>     // Maps from query id (to be cancelled) to a list of failed Impalads that 
> are
>     // the cause of the cancellation.
>     map<TUniqueId, vector<TNetworkAddress> > queries_to_cancel; // VVVV: 
> LOCAL MAP
>     {
>       // Build a list of queries that are running on failed hosts (as 
> evidenced by their
>       // absence from the membership list).
>       // TODO: crash-restart failures can give false negatives for failed 
> Impala demons.
>       lock_guard<mutex> l(query_locations_lock_);
>       QueryLocations::const_iterator loc_entry = query_locations_.begin();
>       while (loc_entry != query_locations_.end()) {
>         if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>           unordered_set<TUniqueId>::const_iterator query_id = 
> loc_entry->second.begin();
>           // Add failed backend locations to all queries that ran on that 
> backend.
>           for(; query_id != loc_entry->second.end(); ++query_id) {
>             vector<TNetworkAddress>& failed_hosts = 
> queries_to_cancel[*query_id];
>             failed_hosts.push_back(loc_entry->first);
>           }
>           
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first);
>           // We can remove the location wholesale once we know backend's 
> failed. To do so
>           // safely during iteration, we have to be careful not in invalidate 
> the current
>           // iterator, so copy the iterator to do the erase(..) and advance 
> the original.
>           QueryLocations::const_iterator failed_backend = loc_entry;
>           ++loc_entry;
>           // VVVV: WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE.
>           query_locations_.erase(failed_backend);
>         } else {
>           ++loc_entry;
>         }
>       }
>     }
>     if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() >
>         MAX_CANCELLATION_QUEUE_SIZE) {
>       // Ignore the cancellations - we'll be able to process them on the next 
> heartbeat
>       // instead.
>       LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full";
>       // VVVV: WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND 
> THE FAILED BACKEND AGAIN.
>     }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE

Reply via email to