[
https://issues.apache.org/jira/browse/IMPALA-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-414.
----------------------------------
Resolution: Duplicate
I think this is describing IMPALA-2990 since we only need to detect
crash-restart failures reliably in order to kill queries that were executing on
the failed node.
> Impala server cannot detect crash-restart failures reliably
> -----------------------------------------------------------
>
> Key: IMPALA-414
> URL: https://issues.apache.org/jira/browse/IMPALA-414
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Affects Versions: Impala 1.0.1
> Reporter: Henry Robinson
> Priority: Minor
> Labels: statestore
>
> The membership mechanism used to tell Impala servers about failures does not
> always detect fast crash-restarts. If a server restarts and re-registers
> before the state-store recognises that it has failed, the failure won't get
> reported to any other subscriber.
> The right way to fix this, I think, is to track a version number in every
> subscriber. When a subscriber reconnects, it gets a new version number. For
> every query, we track the highest version number of the subscriber known at
> that time. Then if any backend executing a query has a higher version number,
> it's likely to have restarted since the query started. There might be a
> couple of false positives, since a node could conceivably restart between a
> scheduling assignment and actually receiving a query, but that's unlikely and
> better than false negatives.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]