[
https://issues.apache.org/jira/browse/IMPALA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Ho resolved IMPALA-6907.
--------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
Impala 2.13.0
> ImpalaServer::MembershipCallback() may not remove all stale connections to
> disconnected Impalad nodes
> -----------------------------------------------------------------------------------------------------
>
> Key: IMPALA-6907
> URL: https://issues.apache.org/jira/browse/IMPALA-6907
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0,
> Impala 2.12.0
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently, {{ImpalaServer::MembershipCallback()}} will remove stale
> connections to hosts which were removed from the cluster membership.
> {noformat}
> while (loc_entry != query_locations_.end()) {
> if (current_membership.find(loc_entry->first) ==
> current_membership.end()) {
> unordered_set<TUniqueId>::const_iterator query_id =
> loc_entry->second.begin();
> // Add failed backend locations to all queries that ran on that
> backend.
> for(; query_id != loc_entry->second.end(); ++query_id) {
> vector<TNetworkAddress>& failed_hosts =
> queries_to_cancel[*query_id];
> failed_hosts.push_back(loc_entry->first);
> }
>
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first);
> <<<-----
> {noformat}
> However, it's relies on checking against {{query_locations_}} which is
> populated only when the Impalad node acts as a coordinator and currently
> running queries using the disconnected backend. So
> {{ImpalaServer::MembershipCallback()}} will not reliably remove stale
> connections to hosts removed from cluster. This may cause stale connections
> to stay in connection cache for extended period of time, leading to query
> failure after the removed hosts rejoined the cluster as the stale connections
> are used.
> Instead, we should remove stale connections regardless of whether this node
> happens to be currently coordinating a query using that backend.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]