Michael Ho created IMPALA-8685:
----------------------------------
Summary: Evaluate default configuration of
NUM_REMOTE_EXECUTOR_CANDIDATES
Key: IMPALA-8685
URL: https://issues.apache.org/jira/browse/IMPALA-8685
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Michael Ho
The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default.
This means that there are potentially 3 different executors which can process a
remote scan range. Over time, the data of a given remote scan range will be
spread across these 3 executors. My understanding of why this is not set to 1
is to avoid hot spots in pathological cases. On the other hand, this may mean
that we may not maximize the utilization of the file handle cache and data
cache. Also, for small clusters (e.g. a 3 node cluster), the default value may
render deterministic remote scan range scheduling ineffective. We may want to
re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One idea
is to set it to min(3, half of cluster size) so it works okay with small
cluster, which may be rather common for demo purposes. There may also be other
criteria for evaluating the default value.
cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)