In the current cloud deployment, users are limited by the disk space of the
cluster's nodes. However, the blob storage services provided by cloud
providers (e.g., S3) can virtually store an "unlimited" amount of data.
Thus, AsterixDB can provide the means to store beyond what the cluster's
local drives can.

In this proposal, we want to extend AsterixDB's capability to allow the
local drives to act as a cache, instead of a mirror image of what's stored
in the cloud. By "as a cache" we mean files and pages can be
retrieved/persited and removed (evicted) from the local drives, according
to some policy.

The aim of this proposal is to describe and implement a mechanism called "*Weep
and Sweep*". Those are the names of two phases when the amount of the data
in the cloud exceeds the space of the cluster's local disks.
Weep

When the disk is pressured (the pressure size can be configured), the
system will start to "weep" and devise a plan to what should be "evicted"
according to some statistics and policies, *which are not solidified yet
and still a work in progress.*
Sweep

After "weeping", a sweep operation will take place and start evicting what
the weep's plan considers as evictable. Depending on the index type
(primary/secondary) and the storage format (row/column), the smallest
evictable unit can differ. The following table shows the smallest unit of
evictable unit:
*Index Type* *Evictable*
Metadata Indexes (e.g., Dataset, ..etc) Not evictable
Secondary indexes Evicted as a whole
Primary Indexes (Row) Evicted as a whole
Primary Indexes (Columnar) Columns (or columns’ pages)
Featured Considerations

   - For columnar primary index, they will never be downloaded as a whole
      - Instead, columns will be streamed from the cloud (if accessed for
      the first time) and persisted to local disk if necessary
   - We are considering providing a mechanism to prefetch the next columns
   of the next mega-leaf node
   <https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf>. The hope here
   is to mask any latencies when reading columns from the cloud
   - Depending on the disk pressure and the operation, the system can
   determine if the streamed columns from the cloud are "worthy" to be cached
   locally. For example, if columns are read in a merge operation, it might
   not be "wise" to persist these columns as their on-disk component is going
   to be deleted at the end of the merge operation. Thus, it might be "better"
   to dedicate the free space on disk for the newly created/merged component.


Multiple aspects (such as the evictable units and policies) of this APE are
not solidified yet, but the core concepts are in place and are ready for
the community's vote :)

EPIC: ASTERIXDB-3373 <https://issues.apache.org/jira/browse/ASTERIXDB-3373>
-- 

*Regards,*
Wail Alkowaileet

Reply via email to