[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wail Y. Alkowaileet updated ASTERIXDB-3373:
-------------------------------------------
    Description: 
In the current cloud deployment, users are limited by the disk space of the 
cluster's nodes. However, the blob storage services provided by cloud providers 
(e.g., S3) can virtually store an "unlimited" amount of data. Thus, AsterixDB 
can provide the means to store beyond what the cluster's local drives can.

In this proposal, we want to extend AsterixDB's capability to allow the local 
drives to act as a cache, instead of a mirror image of what's stored in the 
cloud. By "as a cache" we mean files and pages can be retrieved/persited and 
removed (evicted) from the local drives, according to some policy.

The aim of this proposal is to describe and implement a mechanism called 
"{*}Weep and Sweep{*}". Those are the names of two phases when the amount of 
the data in the cloud exceeds the space of the cluster's local disks.
h2. Weep

When the disk is pressured (the pressure size can be configured), the system 
will start to "weep" and devise a plan to what should be "evicted" according to 
some statistics and policies, *which are not solidified yet and still a work in 
progress.*
h2. Sweep

After "weeping", a sweep operation will take place and start evicting what the 
weep's plan considers as evictable. Depending on the index type 
(primary/secondary) and the storage format (row/column), the smallest evictable 
unit can differ. The following table shows the smallest unit of evictable unit:
|*Index Type*|*Evictable*|
|Metadata Indexes (e.g., Dataset, ..etc)|Not evictable|
|Secondary indexes|Evicted as a whole|
|Primary Indexes (Row)|Evicted as a whole|
|Primary Indexes (Columnar)|Columns (or columns’ pages)|
h2. Featured Considerations
 * For columnar primary index, they will never be downloaded as a whole
 ** Instead, columns will be streamed from the cloud (if accessed for the first 
time) and persisted to local disk if necessary
 * We are considering providing a mechanism to prefetch the next columns of the 
next [mega-leaf node|https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf]. 
The hope here is to mask any latencies when reading columns from the cloud
 * Depending on the disk pressure and the operation, the system can determine 
if the streamed columns from the cloud are "worthy" to be cached locally. For 
example, if columns are read in a merge operation, it might not be "wise" to 
persist these columns as their on-disk component is going to be deleted at the 
end of the merge operation. Thus, it might be "better" to dedicate the free 
space on disk for the newly created/merged component. 

 

  was:
In the current cloud deployment, users are limited by the disk space of the 
cluster's nodes. However, the blob storage services provided by cloud providers 
(e.g., S3) can virtually store "unlimited" amount of data. Thus, AsterixDB can 
provide the means to store beyond what the cluster's local drives can.

In this proposal, we want to extend AsterixDB's capability to allow to use the 
local drives as a cache instead of a mirror image of what's stored in the 
cloud. By "as a cache" we mean files and pages can be retrieved and removed 
(evicted) at well from the local drives.

The aim of this proposal is to describe and implement a mechanism called "Weep 
and Sweep". Those are the names of two phases when the amount of the data in 
the cloud exceed the space of the cluster's local disks.
h2. Weep

When the disk is pressured (the pressure size can be configured), the system 
will start to "weep" and devise a plan to what should be "evicted" according to 
some statistics and policies, *which are not solidified yet and still a work in 
progress.*
h2. Sweep

After "weeping", a sweep operation will take place and start evicting what the 
weep's plan consider as evictable. Depending on the index type 
(primary/secondary) and the storage format (row/column), the smallest evictable 
unit can differ. The following table shows the smallest unit of evictable unit:
|*Index Type*|*Evictable*|
|Metadata Indexes (e.g., Dataset, ..etc)|Not evictable|
|Secondary indexes|Evicted as a whole|
|Primary Indexes (Row)|Evicted as a whole|
|Primary Indexes (Columnar)|Columns (or columns’ pages)|
h2. Featured Considerations
 * For columnar primary index, they will never be downloaded as a whole
 ** Instead, columns will be streamed from the cloud (if accessed for the first 
time) and persisted to local disk if necessary
 * We are considering to provide a mechanism to prefetch the next columns of 
the next [mega-leaf 
node|https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf]. The hope here is 
to mask any latencies when reading columns from the cloud
 * Depending on the disk pressure and the operation, the system can determine 
if the streamed columns from the cloud is "worthy" to be cached locally. For 
example, if columns are read in a merge operation, it might not be "wise" to 
persist these columns as their on-disk component is going to be deleted at the 
end of the merge operation. Thus, it might be "better" to dedicate the free 
space on disk for the newly created/merged component. 

 


> Unlimited Storage: Local disk caching in cloud deployment
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-3373
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3373
>             Project: Apache AsterixDB
>          Issue Type: Epic
>    Affects Versions: 0.9.9
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>             Fix For: 0.9.9
>
>
> In the current cloud deployment, users are limited by the disk space of the 
> cluster's nodes. However, the blob storage services provided by cloud 
> providers (e.g., S3) can virtually store an "unlimited" amount of data. Thus, 
> AsterixDB can provide the means to store beyond what the cluster's local 
> drives can.
> In this proposal, we want to extend AsterixDB's capability to allow the local 
> drives to act as a cache, instead of a mirror image of what's stored in the 
> cloud. By "as a cache" we mean files and pages can be retrieved/persited and 
> removed (evicted) from the local drives, according to some policy.
> The aim of this proposal is to describe and implement a mechanism called 
> "{*}Weep and Sweep{*}". Those are the names of two phases when the amount of 
> the data in the cloud exceeds the space of the cluster's local disks.
> h2. Weep
> When the disk is pressured (the pressure size can be configured), the system 
> will start to "weep" and devise a plan to what should be "evicted" according 
> to some statistics and policies, *which are not solidified yet and still a 
> work in progress.*
> h2. Sweep
> After "weeping", a sweep operation will take place and start evicting what 
> the weep's plan considers as evictable. Depending on the index type 
> (primary/secondary) and the storage format (row/column), the smallest 
> evictable unit can differ. The following table shows the smallest unit of 
> evictable unit:
> |*Index Type*|*Evictable*|
> |Metadata Indexes (e.g., Dataset, ..etc)|Not evictable|
> |Secondary indexes|Evicted as a whole|
> |Primary Indexes (Row)|Evicted as a whole|
> |Primary Indexes (Columnar)|Columns (or columns’ pages)|
> h2. Featured Considerations
>  * For columnar primary index, they will never be downloaded as a whole
>  ** Instead, columns will be streamed from the cloud (if accessed for the 
> first time) and persisted to local disk if necessary
>  * We are considering providing a mechanism to prefetch the next columns of 
> the next [mega-leaf 
> node|https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf]. The hope here 
> is to mask any latencies when reading columns from the cloud
>  * Depending on the disk pressure and the operation, the system can determine 
> if the streamed columns from the cloud are "worthy" to be cached locally. For 
> example, if columns are read in a merge operation, it might not be "wise" to 
> persist these columns as their on-disk component is going to be deleted at 
> the end of the merge operation. Thus, it might be "better" to dedicate the 
> free space on disk for the newly created/merged component. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to