[ 
https://issues.apache.org/jira/browse/IMPALA-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-7872:
-----------------------------------

    Assignee:     (was: Wenzhe Zhou)

> Extended health checks to mark node as down
> -------------------------------------------
>
>                 Key: IMPALA-7872
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7872
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: Availability, statestore
>
> This is an umbrella JIRA to improve handling of complex failure modes aside 
> from fail-stop. The current statestore heartbeat mechanism assumes that an 
> Impala daemon that responds to heartbeats is healthy and can be scheduled on. 
> Memory-based admission control provides a bit more robustness here by not 
> admitting queries on daemons where memory would be oversubscribed.
> Examples of failure modes of interest are:
> * Hangs, where a particular node can't make progress (the JVM hangs in 
> IMPALA-7483 are a good example) on some or all queries.
> * Repeated fragment instance startup failures. E.g. where coordinators can't 
> successfully start fragments on an impala daemon, because of communication 
> errors or other issues.
> We can't automatically handle all failure modes, but we could improve 
> handling of some common ones, particularly repeated fragment startup failures 
> or hangs. The goal would be to degrade more gracefully to avoid repeated 
> failures causing a cluster-wide outage. The goal isn't to prevent all 
> failures, just to recover to a healthy state automatically in more scenarios.
> IMPALA-1760 (graceful shutdown) may give us some better options here, since 
> if a node notices that it is somehow unhealthy, it could gracefully remove 
> itself from scheduling and restart itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to