[ https://issues.apache.org/jira/browse/IMPALA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750483#comment-16750483 ]
Joe McDonnell commented on IMPALA-7928: --------------------------------------- [~arodoni_cloudera] I don't think this will have a doc impact. This will have a new query option, but it should not be documented. > Investigate consistent placement of remote scan ranges > ------------------------------------------------------ > > Key: IMPALA-7928 > URL: https://issues.apache.org/jira/browse/IMPALA-7928 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.2.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Critical > > With the file handle cache, it is useful for repeated scans of the same file > to go to the same node, as that node will already have a file handle cached. > When scheduling remote ranges, the scheduler introduces randomness that can > spread reads across all of the nodes. Repeated executions of queries on the > same set of files will not schedule the remote reads on the same nodes. This > causes a large amount of duplication across file handle caches on different > nodes. This reduces the efficiency of the cache significantly. > It may be useful for the scheduler to introduce some determinism in > scheduling remote reads to take advantage of the file handle cache. This is a > variation on the well-known tradeoff between skew and locality. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org