[jira] [Comment Edited] (IMPALA-12382) Coordinator could schedule fragments on gracefully shutdown executors

Wenzhe Zhou (Jira) Thu, 24 Aug 2023 11:39:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758687#comment-17758687
 ]


Wenzhe Zhou edited comment on IMPALA-12382 at 8/24/23 6:38 PM:
---------------------------------------------------------------

If the executor is removed from the cluster membership by statestore when 
receiving un-registering request, it could affect running queries. Coordinators 
cancel the queries which are running on failed executors (as evidenced by their 
absence from the membership list). See 
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].

It seems we already have 
[mechanism|https://github.com/apache/impala/blob/master/be/src/service/impala-server.h#L124-L126]
 to avoid scheduling new task on the executors which are shutting down by 
marking the executor in "quiescing" state. 


was (Author: wzhou):
If the executor is removed from the cluster membership by statestore when 
receiving un-registering request, it could affect running queries. Coordinators 
cancel the queries which are running on failed executors (as evidenced by their 
absence from the membership list). See 
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].


> Coordinator could schedule fragments on gracefully shutdown executors
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-12382
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12382
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Abhishek Rawat
>            Assignee: Wenzhe Zhou
>            Priority: Critical
>
> Statestore does failure detection based on consecutive heartbeat failures. 
> This is by default configured to be 10 (statestore_max_missed_heartbeats) at 
> 1 second intervals (statestore_heartbeat_frequency_ms). This could however 
> take much longer than 10 seconds overall, especially if statestore is busy 
> and due to rpc timeout duration.
> In the following example it took 50 seconds for failure detection:
> {code:java}
> I0817 12:32:06.824721    86 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:06.824741    86 failure-detector.cc:91] 1 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is OK
> .....
> .....
> .....
> I0817 12:32:56.800251    83 statestore.cc:1157] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
>  received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
> exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala18THeartbeatResponseE, send: done 
> I0817 12:32:56.800267    83 failure-detector.cc:91] 10 consecutive heartbeats 
> failed for 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
>  State is FAILED
> I0817 12:32:56.800276    83 statestore.cc:1168] Subscriber 
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
>  has failed, disconnected or re-registered (last known registration ID: 
> c84bf70f03acda2b:b34a812c5e96e687){code}
> As a result there is a window when statestore is determining node failure and 
> coordinator might schedule fragments on that particular executor(s). The exec 
> RPC will fail and if transparent query retries is enabled, coordinator will 
> immediately retry the query and it will fail again.
> Ideally in such situations coordinator should be notified sooner about a 
> failed executor. Statestore could send priority topic update to coordinator 
> when it enters failure detection logic. This should reduce the chances of 
> coordinator scheduling query fragment on a failed executor.
> The other argument could be to tune the heartbeat frequency and interval 
> parameters. But, it's hard to find configuration which works for all cases. 
> And, so while the default values are reasonable, under certain conditions 
> they could be unreasonable as seen in the above example.
> It might make sense to especially handle the case where executors are 
> shutdown gracefully and in such case statestore shouldn't do failure 
> detection and instead fail these executor immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (IMPALA-12382) Coordinator could schedule fragments on gracefully shutdown executors

Reply via email to