[ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800428#comment-16800428
 ] 

Michael Ho commented on IMPALA-8341:
------------------------------------

The initial goal is to keep a working set in the local storage (e.g. SSD, HDD) 
at the compute nodes and to rely on the OS kernel page cache management to keep 
the hot working set in memory.

Initially, the data cached will be purely blocks within a file on the remote 
storage. There are definitely a lot of rooms for experimentation / improvement 
in the future:
 - context aware caching (e.g. caching uncompressed column chunks of Parquet 
files, caching deserialized footer of Parquet files)
 - cache tiering (e.g. keep hot data in memory and evict cold entries to 
secondary storage (e.g. NVME, SSD))

The idea behind a simple storage based cache is that:
1. storage is relatively inexpensive compared to memory
2. if after the initial cold miss, the working set fits in local storage, the 
performance should be close to that of local read configurations.

> Data cache for remote reads
> ---------------------------
>
>                 Key: IMPALA-8341
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8341
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Major
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache 
> at the compute nodes to cache the working set. With deterministic scan range 
> scheduling, each compute node should hold non-overlapping partitions of the 
> data set. 
> A prototype of the cache was posted here: 
> https://gerrit.cloudera.org/#/c/12683/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to