[ 
https://issues.apache.org/jira/browse/IMPALA-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-414.
----------------------------------
    Resolution: Duplicate

I think this is describing IMPALA-2990 since we only need to detect 
crash-restart failures reliably in order to kill queries that were executing on 
the failed node.

> Impala server cannot detect crash-restart failures reliably
> -----------------------------------------------------------
>
>                 Key: IMPALA-414
>                 URL: https://issues.apache.org/jira/browse/IMPALA-414
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 1.0.1
>            Reporter: Henry Robinson
>            Priority: Minor
>              Labels: statestore
>
> The membership mechanism used to tell Impala servers about failures does not 
> always detect fast crash-restarts. If a server restarts and re-registers 
> before the state-store recognises that it has failed, the failure won't get 
> reported to any other subscriber.
> The right way to fix this, I think, is to track a version number in every 
> subscriber. When a subscriber reconnects, it gets a new version number. For 
> every query, we track the highest version number of the subscriber known at 
> that time. Then if any backend executing a query has a higher version number, 
> it's likely to have restarted since the query started. There might be a 
> couple of false positives, since a node could conceivably restart between a 
> scheduling assignment and actually receiving a query, but that's unlikely and 
> better than false negatives.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to