[ 
https://issues.apache.org/jira/browse/IMPALA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6907.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> ImpalaServer::MembershipCallback() may not remove all stale connections to 
> disconnected Impalad nodes
> -----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6907
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6907
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently, {{ImpalaServer::MembershipCallback()}} will remove stale 
> connections to hosts which were removed from the cluster membership.
> {noformat}
>       while (loc_entry != query_locations_.end()) {
>         if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>           unordered_set<TUniqueId>::const_iterator query_id = 
> loc_entry->second.begin();
>           // Add failed backend locations to all queries that ran on that 
> backend.
>           for(; query_id != loc_entry->second.end(); ++query_id) {
>             vector<TNetworkAddress>& failed_hosts = 
> queries_to_cancel[*query_id];
>             failed_hosts.push_back(loc_entry->first);
>           }
>           
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); 
> <<<-----
> {noformat}
> However, it's relies on checking against {{query_locations_}} which is 
> populated only when the Impalad node acts as a coordinator and currently 
> running queries using the disconnected backend. So 
> {{ImpalaServer::MembershipCallback()}} will not reliably remove stale 
> connections to hosts removed from cluster. This may cause stale connections 
> to stay in connection cache for extended period of time, leading to query 
> failure after the removed hosts rejoined the cluster as the stale connections 
> are used.
> Instead, we should remove stale connections regardless of whether this node 
> happens to be currently coordinating a query using that backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to