[
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877406#comment-16877406
]
ASF subversion and git services commented on IMPALA-8341:
---------------------------------------------------------
Commit 875961da6fc60901685b2cc2de50644464ae102e in impala's branch
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=875961d ]
IMPALA-8341: [DOCS] Added a note about the experiemental feature
Change-Id: If9f5795a28738f7df435a3bea06ffa1a216fd04e
Reviewed-on: http://gerrit.cloudera.org:8080/13788
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Ho <[email protected]>
> Data cache for remote reads
> ---------------------------
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Critical
> Fix For: Impala 3.3.0
>
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud
> settings (e.g. data stored in object store), the computation and storage are
> no longer co-located. This breaks the typical pattern in which Impala query
> fragment instances are scheduled at where the data is located. In this
> setting, the network bandwidth requirement of both the nics and the top of
> rack switches will go up quite a lot as the network traffic includes the data
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache
> at the compute nodes to cache the working set. With deterministic scan range
> scheduling, each compute node should hold non-overlapping partitions of the
> data set.
> An initial prototype of the cache was posted here:
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g.
> not holding the lock while doing IO).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]