[ 
https://issues.apache.org/jira/browse/IMPALA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761494#comment-16761494
 ] 

ASF subversion and git services commented on IMPALA-7928:
---------------------------------------------------------

Commit 24eab713a0d35f629509f59711f8a563e1346acf in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=24eab71 ]

IMPALA-7928: Consistent remote read scheduling

Currently, remote reads for a particular file are not scheduled to
a consistent set of nodes. This reduces the efficiency of the HDFS
file handle cache.

This modifies the scheduling of remote reads to limit the number
of executors considered when picking an executor for a remote scan
range. The remote executor candidates are generated by hashing the
filename+offset multiple times and finding the closest nodes in a
hash ring. This is a consistent hash that is designed to limit the
number of files remapped when cluster nodes come and go. The number
of remote executor candidates is controlled by a query option
'num_remote_executor_candidates', which defaults to 3. It is
capped at 16.

Once the remote executor candidates are chosen, the algorithm for
picking a specific replica uses the same algorithm as picking a
local replica. It picks the node with the minimum number of
assigned bytes and uses 'schedule_random_replica' to determine
how to break ties.

This leaves the normal algorithms in place for local files, Kudu,
and HBase. If 'num_remote_executor_candidates' is set to 0, the
existing remote scheduling algorithm is used. The existing
algorithm schedules remote scan ranges on all available executors.

Testing:
 - There is a new hash-ring-test and related tests in scheduler-test.
 - There is a utility (hash-ring-util) in experiments for hand tuning
   the hash ring.

Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Reviewed-on: http://gerrit.cloudera.org:8080/12037
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Investigate consistent placement of remote scan ranges
> ------------------------------------------------------
>
>                 Key: IMPALA-7928
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7928
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> With the file handle cache, it is useful for repeated scans of the same file 
> to go to the same node, as that node will already have a file handle cached.
> When scheduling remote ranges, the scheduler introduces randomness that can 
> spread reads across all of the nodes. Repeated executions of queries on the 
> same set of files will not schedule the remote reads on the same nodes. This 
> causes a large amount of duplication across file handle caches on different 
> nodes. This reduces the efficiency of the cache significantly.
> It may be useful for the scheduler to introduce some determinism in 
> scheduling remote reads to take advantage of the file handle cache. This is a 
> variation on the well-known tradeoff between skew and locality.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to