[
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-1760:
----------------------------------
Description:
In larger clusters, node maintenance is a frequent occurrence. There's no way
currently to stop an Impala node without failing running queries, without
draining queries across the whole cluster first. We should fix that.
was:
In larger clusters, node maintenance is a frequent occurrence. There's no way
currently to stop an Impala node without failing running queries, without
draining queries across the whole cluster first. We should fix that.
Here's a proposal:
* Add a {{Decommission}} RPC to ImpalaServer. Calling this causes an Impala
daemon to stop accepting new fragments or queries.
* The Impala daemon should mark its entry in the membership statestore topic as
'decommissioning'. This tells other Impala daemons not to try to assign work to
it.
* Once the running queries / fragments have finished (or maybe after a timeout
has elapsed?), the Impala daemon will remove itself entirely from the
statestore membership topic and enter 'offline mode'.
* Either Decommission() returns then, or the caller can check the statestore
topic.
* Any Impala daemon that's in the process of sending work to a decommission
node (because of the race between the {{Decommission()}} call and every node
getting the statestore up-date) should retry the query from the point of
scheduling. It should only do this, say, three times before aborting the query.
> Add decommissioning support / graceful shutdown / quiesce
> ---------------------------------------------------------
>
> Key: IMPALA-1760
> URL: https://issues.apache.org/jira/browse/IMPALA-1760
> Project: IMPALA
> Issue Type: New Feature
> Components: Distributed Exec
> Affects Versions: Impala 2.1.1
> Reporter: Henry Robinson
> Assignee: Tim Armstrong
> Priority: Critical
> Labels: resource-management, scalability, scheduler, usability
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way
> currently to stop an Impala node without failing running queries, without
> draining queries across the whole cluster first. We should fix that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]