+1

On Dec 1, 2023 at 12:28:23, Murtadha Al-Hubail <[email protected]> wrote:

> Each AsterixDB cluster today consists of one or more Node Controllers (NC)
> where the data is stored and processed. Each NC has a predefined set of
> storage partitions (iodevices). When data is ingested into the system, the
> data is hash-partitioned across the total number of storage partitions in
> the cluster. Similarly, when the data is queried, each NC will start as
> many threads as the number storage partitions it has to read and process
> the data in parallel. While this shared-nothing architecture has its
> advantages, it has its drawbacks too. One major drawback is the time needed
> to scale the cluster. Adding a new NC to an existing cluster of (n) nodes
> means writing a completely new copy of the data which will now be
> hash-partitioned to the new total number of storage partitions of (n + 1)
> nodes. This operation could potentially take several hours or even days
> which is unacceptable in the cloud age.
>
> This APE is about adding a new deployment (cloud) mode to AsterixDB by
> implementing compute-storage separation to take advantage of the elasticity
> of the cloud. This will require the following:
>
> 1. Moving from the dynamic data partitioning described earlier to a static
> data partitioning based on a configurable, but fixed during a cluster's
> life, number of storage partitions.
> 2. Introducing the concept of a "compute partition" where each NC will have
> a fixed number of compute partitions. This number could potentially be
> based on the number of CPU cores it has.
>
> This will decouple the number of storage partitions being processed on an
> NC from the number of its compute partitions.
>
> When an AsterixDB cluster is deployed using the cloud mode, we will do the
> following:
>
> - The Cluster Controller will maintain a map containing the assignment of
> storage partitions to compute partitions.
> - New writes will be written to the NC's local storage and uploaded to an
> object store (e.g. AWS S3) which will be used as a highly available shared
> filesystem between NCs.
> - On queries, each NC will start as many threads as its compute partitions
> to process its currently assigned storage partitions.
> - On scaling operations, we will simply update the assignment map and NCs
> will lazily cache any data of newly assigned storage partitions from the
> object store.
>
> Improvement tickets:
> Static data partitioning:
> https://issues.apache.org/jira/browse/ASTERIXDB-3144
> Compute-Storage Separation
> https://issues.apache.org/jira/browse/ASTERIXDB-3196
>
> Please vote on this APE. We'll keep this open for 72 hours and pass with
> either 3 votes or a majority of positive votes.
>

Reply via email to