Michael Ho created IMPALA-8685:
----------------------------------

             Summary: Evaluate default configuration of 
NUM_REMOTE_EXECUTOR_CANDIDATES
                 Key: IMPALA-8685
                 URL: https://issues.apache.org/jira/browse/IMPALA-8685
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Michael Ho


The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
This means that there are potentially 3 different executors which can process a 
remote scan range. Over time, the data of a given remote scan range will be 
spread across these 3 executors. My understanding of why this is not set to 1 
is to avoid hot spots in pathological cases. On the other hand, this may mean 
that we may not maximize the utilization of the file handle cache and data 
cache. Also, for small clusters (e.g. a 3 node cluster), the default value may 
render deterministic remote scan range scheduling ineffective. We may want to 
re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One idea 
is to set it to min(3, half of cluster size) so it works okay with small 
cluster, which may be rather common for demo purposes. There may also be other 
criteria for evaluating the default value.

cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to