[
https://issues.apache.org/jira/browse/IMPALA-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou reassigned IMPALA-7872:
-----------------------------------
Assignee: (was: Wenzhe Zhou)
> Extended health checks to mark node as down
> -------------------------------------------
>
> Key: IMPALA-7872
> URL: https://issues.apache.org/jira/browse/IMPALA-7872
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Reporter: Tim Armstrong
> Priority: Major
> Labels: Availability, statestore
>
> This is an umbrella JIRA to improve handling of complex failure modes aside
> from fail-stop. The current statestore heartbeat mechanism assumes that an
> Impala daemon that responds to heartbeats is healthy and can be scheduled on.
> Memory-based admission control provides a bit more robustness here by not
> admitting queries on daemons where memory would be oversubscribed.
> Examples of failure modes of interest are:
> * Hangs, where a particular node can't make progress (the JVM hangs in
> IMPALA-7483 are a good example) on some or all queries.
> * Repeated fragment instance startup failures. E.g. where coordinators can't
> successfully start fragments on an impala daemon, because of communication
> errors or other issues.
> We can't automatically handle all failure modes, but we could improve
> handling of some common ones, particularly repeated fragment startup failures
> or hangs. The goal would be to degrade more gracefully to avoid repeated
> failures causing a cluster-wide outage. The goal isn't to prevent all
> failures, just to recover to a healthy state automatically in more scenarios.
> IMPALA-1760 (graceful shutdown) may give us some better options here, since
> if a node notices that it is somehow unhealthy, it could gracefully remove
> itself from scheduling and restart itself.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]