[ 
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1760:
----------------------------------
    Description: 
In larger clusters, node maintenance is a frequent occurrence. There's no way 
currently to stop an Impala node without failing running queries, without 
draining queries across the whole cluster first. We should fix that.


  was:
In larger clusters, node maintenance is a frequent occurrence. There's no way 
currently to stop an Impala node without failing running queries, without 
draining queries across the whole cluster first. We should fix that.

Here's a proposal:

* Add a {{Decommission}} RPC to ImpalaServer. Calling this causes an Impala 
daemon to stop accepting new fragments or queries.
* The Impala daemon should mark its entry in the membership statestore topic as 
'decommissioning'. This tells other Impala daemons not to try to assign work to 
it.
* Once the running queries / fragments have finished (or maybe after a timeout 
has elapsed?), the Impala daemon will remove itself entirely from the 
statestore membership topic and enter 'offline mode'. 
* Either Decommission() returns then, or the caller can check the statestore 
topic.
* Any Impala daemon that's in the process of sending work to a decommission 
node (because of the race between the {{Decommission()}} call and every node 
getting the statestore up-date) should retry the query from the point of 
scheduling. It should only do this, say, three times before aborting the query.


> Add decommissioning support / graceful shutdown / quiesce
> ---------------------------------------------------------
>
>                 Key: IMPALA-1760
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1760
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Distributed Exec
>    Affects Versions: Impala 2.1.1
>            Reporter: Henry Robinson
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: resource-management, scalability, scheduler, usability
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way 
> currently to stop an Impala node without failing running queries, without 
> draining queries across the whole cluster first. We should fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to