[
https://issues.apache.org/jira/browse/IMPALA-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou resolved IMPALA-12382.
----------------------------------
Resolution: Won't Fix
We already have mechanism to avoid scheduling new task on the executors which
are shutting down.
> Coordinator could schedule fragments on gracefully shutdown executors
> ---------------------------------------------------------------------
>
> Key: IMPALA-12382
> URL: https://issues.apache.org/jira/browse/IMPALA-12382
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Abhishek Rawat
> Assignee: Wenzhe Zhou
> Priority: Critical
>
> Statestore does failure detection based on consecutive heartbeat failures.
> This is by default configured to be 10 (statestore_max_missed_heartbeats) at
> 1 second intervals (statestore_heartbeat_frequency_ms). This could however
> take much longer than 10 seconds overall, especially if statestore is busy
> and due to rpc timeout duration.
> In the following example it took 50 seconds for failure detection:
> {code:java}
> I0817 12:32:06.824721 86 statestore.cc:1157] Unable to send heartbeat
> message to subscriber
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
> received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected
> exception: No more data to read., type:
> N6apache6thrift9transport19TTransportExceptionE, rpc:
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:06.824741 86 failure-detector.cc:91] 1 consecutive heartbeats
> failed for
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
> State is OK
> .....
> .....
> .....
> I0817 12:32:56.800251 83 statestore.cc:1157] Unable to send heartbeat
> message to subscriber
> impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
> received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected
> exception: No more data to read., type:
> N6apache6thrift9transport19TTransportExceptionE, rpc:
> N6impala18THeartbeatResponseE, send: done
> I0817 12:32:56.800267 83 failure-detector.cc:91] 10 consecutive heartbeats
> failed for
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
> State is FAILED
> I0817 12:32:56.800276 83 statestore.cc:1168] Subscriber
> 'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
> has failed, disconnected or re-registered (last known registration ID:
> c84bf70f03acda2b:b34a812c5e96e687){code}
> As a result there is a window when statestore is determining node failure and
> coordinator might schedule fragments on that particular executor(s). The exec
> RPC will fail and if transparent query retries is enabled, coordinator will
> immediately retry the query and it will fail again.
> Ideally in such situations coordinator should be notified sooner about a
> failed executor. Statestore could send priority topic update to coordinator
> when it enters failure detection logic. This should reduce the chances of
> coordinator scheduling query fragment on a failed executor.
> The other argument could be to tune the heartbeat frequency and interval
> parameters. But, it's hard to find configuration which works for all cases.
> And, so while the default values are reasonable, under certain conditions
> they could be unreasonable as seen in the above example.
> It might make sense to especially handle the case where executors are
> shutdown gracefully and in such case statestore shouldn't do failure
> detection and instead fail these executor immediately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]