[ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878020#comment-16878020
 ] 

ASF subversion and git services commented on IMPALA-8341:
---------------------------------------------------------

Commit 5fdef39fcf7e1b59bcf8d670d217ca7a88fc2738 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5fdef39 ]

IMPALA-8341: [DOCS] Added a note about the requirement for existing dirs

Change-Id: I5feddddff3ab7c09ee681098ec3e630977cb92f1
Reviewed-on: http://gerrit.cloudera.org:8080/13793
Reviewed-by: Alex Rodoni <arod...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Data cache for remote reads
> ---------------------------
>
>                 Key: IMPALA-8341
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8341
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Critical
>             Fix For: Impala 3.3.0
>
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache 
> at the compute nodes to cache the working set. With deterministic scan range 
> scheduling, each compute node should hold non-overlapping partitions of the 
> data set. 
> An initial prototype of the cache was posted here: 
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a 
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. 
> not holding the lock while doing IO).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to