Joe McDonnell created IMPALA-7928:
-------------------------------------

             Summary: Investigate consistent placement of remote scan ranges
                 Key: IMPALA-7928
                 URL: https://issues.apache.org/jira/browse/IMPALA-7928
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.2.0
            Reporter: Joe McDonnell


With the file handle cache, it is useful for repeated scans of the same file to 
go to the same node, as that node will already have a file handle cached.

When scheduling remote ranges, the scheduler introduces randomness that can 
spread reads across all of the nodes. Repeated executions of queries on the 
same set of files will not schedule the remote reads on the same nodes. This 
causes a large amount of duplication across file handle caches on different 
nodes. This reduces the efficiency of the cache significantly.

It may be useful for the scheduler to introduce some determinism in scheduling 
remote reads to take advantage of the file handle cache. This is a variation on 
the well-known tradeoff between skew and locality.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to