Murtadha Hubail created ASTERIXDB-1337:
------------------------------------------

             Summary: Dataset Memory Management on Multi-Partition NC
                 Key: ASTERIXDB-1337
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1337
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: AsterixDB, Storage
            Reporter: Murtadha Hubail
            Priority: Minor


Currently, each dataset has a fixed memory budget - total virtual buffer cache 
(VBC) budget - which is configurable by the following attributes in asterix 
configuration file:
storage.memorycomponent.pagesize (Default 128K)
storage.memorycomponent.numpages (Default 256 pages)

Note: a different attributes are used for Metadata datasets.

During query compilation, any index that will be accessed uses 
AbstractLSMIndexDataflowHelperFactory which is passed an instance of 
AsterixVirtualBufferCacheProvider.

Each dataset has a single (AsterixVirtualBufferCacheProvider), which makes all 
indexes and their partitions (different IO devices) of this dataset on the same 
node get access to the same dataset VBC.

During runtime, when the AbstractLSMIndexDataflowHelperFactory is used to 
create the actual IndexDataflowHelper, the dataset VBC is initialized. The 
total VBC budget of the dataset is divided into a number of VBCs which is 
configurable in asterix configuration file as:
storage.memorycomponent.numcomponents (Default 2 VBCs)

Each one of those VBCs is created as an object of type 
MultitenantVirtualBufferCache (MVBC) (in 
DatasetLifecycleManager#initializeDatasetVirtualBufferCache). The size of each 
of these MVBC is (storage.memorycomponent.numpages / 
storage.memorycomponent.numcomponents). Even though the dataset VBCs have been 
initialized, no memory is allocated yet. This avoids the memory allocation of 
disk read only queries or bulkload DDLs.

Upon the first modification (on LSMHarness#modify) of any index partition that 
belongs to this dataset, we allocate the memory on all MVBCs that we 
initialized earlier. This makes all indexes and their partitions of the dataset 
on the same node compete on the budget of a single MVBC at a time. Once a MVBC 
is full, all files opened in it are scheduled to be flushed, and we switch to 
another MVBC (if any is available). This will have the effect of making 
frequent flushes of many small files, which will lead to frequent merges.

I think that it would be better if each partition (IO device on the node) has 
its own MVBC budget.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to