Joe McDonnell created IMPALA-7928:
-------------------------------------
Summary: Investigate consistent placement of remote scan ranges
Key: IMPALA-7928
URL: https://issues.apache.org/jira/browse/IMPALA-7928
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.2.0
Reporter: Joe McDonnell
With the file handle cache, it is useful for repeated scans of the same file to
go to the same node, as that node will already have a file handle cached.
When scheduling remote ranges, the scheduler introduces randomness that can
spread reads across all of the nodes. Repeated executions of queries on the
same set of files will not schedule the remote reads on the same nodes. This
causes a large amount of duplication across file handle caches on different
nodes. This reduces the efficiency of the cache significantly.
It may be useful for the scheduler to introduce some determinism in scheduling
remote reads to take advantage of the file handle cache. This is a variation on
the well-known tradeoff between skew and locality.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]