[
https://issues.apache.org/jira/browse/IMPALA-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758687#comment-17758687
]
Wenzhe Zhou edited comment on IMPALA-12382 at 8/24/23 6:38 PM:
---------------------------------------------------------------
If the executor is removed from the cluster membership by statestore when
receiving un-registering request, it could affect running queries. Coordinators
cancel the queries which are running on failed executors (as evidenced by their
absence from the membership list). See
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].
It seems we already have
[mechanism|https://github.com/apache/impala/blob/master/be/src/service/impala-server.h#L124-L126]
to avoid scheduling new task on the executors which are shutting down by
marking the executor in "quiescing" state.
was (Author: wzhou):
If the executor is removed from the cluster membership by statestore when
receiving un-registering request, it could affect running queries. Coordinators
cancel the queries which are running on failed executors (as evidenced by their
absence from the membership list). See
[ImpalaServer::CancelQueriesOnFailedBackends()|https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L2365-L2375].
> Coordinator could schedule fragments on gracefully shutdown executors
> ---------------------------------------------------------------------
>
> Key: IMPALA-12382
> URL: https://issues.apache.org/jira/browse/IMPALA-12382
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Abhishek Rawat
> Assignee: Wenzhe Zhou
> Priority: Critical
>
> Statestore does failure detection based on consecutive heartbeat failures.
> This is by default configured to be 10 (statestore_max_missed_heartbeats) at
> 1 second intervals (statestore_heartbeat_frequency_ms). This could however
> take much longer than 10 seconds overall, especially if statestore is busy
> and due to rpc timeout duration.
> In the following example it took 50 seconds for failure detection:
> {code:java}
> I0817 12:32:06.824721 86 statestore.cc:1157] Unable to send heartbeat
> message to subscriber
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
> received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected
> exception: No more data to read., type:
> N6apache6thrift9transport19TTransportExceptionE, rpc:
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:06.824741 86 failure-detector.cc:91] 1 consecutive heartbeats
> failed for
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
> State is OK
> .....
> .....
> .....
> I0817 12:32:56.800251 83 statestore.cc:1157] Unable to send heartbeat
> message to subscriber
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
> received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected
> exception: No more data to read., type:
> N6apache6thrift9transport19TTransportExceptionE, rpc:
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:56.800267 83 failure-detector.cc:91] 10 consecutive heartbeats
> failed for
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
> State is FAILED
> I0817 12:32:56.800276 83 statestore.cc:1168] Subscriber
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
> has failed, disconnected or re-registered (last known registration ID:
> c84bf70f03acda2b:b34a812c5e96e687){code}
> As a result there is a window when statestore is determining node failure and
> coordinator might schedule fragments on that particular executor(s). The exec
> RPC will fail and if transparent query retries is enabled, coordinator will
> immediately retry the query and it will fail again.
> Ideally in such situations coordinator should be notified sooner about a
> failed executor. Statestore could send priority topic update to coordinator
> when it enters failure detection logic. This should reduce the chances of
> coordinator scheduling query fragment on a failed executor.
> The other argument could be to tune the heartbeat frequency and interval
> parameters. But, it's hard to find configuration which works for all cases.
> And, so while the default values are reasonable, under certain conditions
> they could be unreasonable as seen in the above example.
> It might make sense to especially handle the case where executors are
> shutdown gracefully and in such case statestore shouldn't do failure
> detection and instead fail these executor immediately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]