[
https://issues.apache.org/jira/browse/ASTERIXDB-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abdullah Alamoudi reassigned ASTERIXDB-1337:
--------------------------------------------
Assignee: Michael Blow
> Dataset Memory Management on Multi-Partition NC
> -----------------------------------------------
>
> Key: ASTERIXDB-1337
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1337
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: AsterixDB, Storage
> Reporter: Murtadha Hubail
> Assignee: Michael Blow
>
> Currently, each dataset has a fixed memory budget - total virtual buffer
> cache (VBC) budget - which is configurable by the following attributes in
> asterix configuration file:
> storage.memorycomponent.pagesize (Default 128K)
> storage.memorycomponent.numpages (Default 256 pages)
> Note: a different attributes are used for Metadata datasets.
> During query compilation, any index that will be accessed uses
> AbstractLSMIndexDataflowHelperFactory which is passed an instance of
> AsterixVirtualBufferCacheProvider.
> Each dataset has a single (AsterixVirtualBufferCacheProvider), which makes
> all indexes and their partitions (different IO devices) of this dataset on
> the same node get access to the same dataset VBC.
> During runtime, when the AbstractLSMIndexDataflowHelperFactory is used to
> create the actual IndexDataflowHelper, the dataset VBC is initialized. The
> total VBC budget of the dataset is divided into a number of VBCs which is
> configurable in asterix configuration file as:
> storage.memorycomponent.numcomponents (Default 2 VBCs)
> Each one of those VBCs is created as an object of type
> MultitenantVirtualBufferCache (MVBC) (in
> DatasetLifecycleManager#initializeDatasetVirtualBufferCache). The size of
> each of these MVBC is (storage.memorycomponent.numpages /
> storage.memorycomponent.numcomponents). Even though the dataset VBCs have
> been initialized, no memory is allocated yet. This avoids the memory
> allocation of disk read only queries or bulkload DDLs.
> Upon the first modification (on LSMHarness#modify) of any index partition
> that belongs to this dataset, we allocate the memory on all MVBCs that we
> initialized earlier. This makes all indexes and their partitions of the
> dataset on the same node compete on the budget of a single MVBC at a time.
> Once a MVBC is full, all files opened in it are scheduled to be flushed, and
> we switch to another MVBC (if any is available). This will have the effect of
> making frequent flushes of many small files, which will lead to frequent
> merges.
> I think that it would be better if each partition (IO device on the node) has
> its own MVBC budget.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)