Re: Issue 868 in asterixdb: About prefix merge policy behavior

Michael Carey Sun, 17 May 2015 17:40:00 -0700

Moving this to the incubator dev list (just looking it over again). Q -if when a merge finishes there's a bigger backlog of components - willit currently consider doing a more-ways merge? (Instead of 5, if thereare 13 sitting there when the 5-merge finishes - will a 13-merge beinitiated?) Just curious. We do probably need to think about some sortof better flow control here - the storage gate should presumably slowdown admissions if it can't keep up - have to ponder what that mightmean. (I have a better idea of what it could mean for feeds than fornormal inserts.) One could argue that an increasing backlog is a signthat we should be scaling out the number of partitions for the dataset(future work but important work :-)).


Cheers,
Mike


On 4/15/15 2:33 PM, [email protected] wrote:

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 868 by [email protected]: About prefix merge policy behavior
https://code.google.com/p/asterixdb/issues/detail?id=868
I describe how current prefix merge policy works based on theobservation from ingestion experiments.
Also, the similar observation was observed by Sattam as well.
The observed behavior seems a bit unexpected, so I post theobservation here to consider better merge policy and/or better lsmindex design regarding merge operations.
The aqls used for the experiment are shown at the end of this writing.
Prefix merge policy decides to merge disk components based on thefollowing conditions1. Look at the candidate components for merging in oldest-firstorder. If one exists, identify the prefix of the sequence of all suchcomponents for which the sum of their sizes exceedsMaxMergableComponentSize. Schedule a merge of those components into anew component.2. If a merge from 1 doesn't happen, see if the set of candidatecomponents for merging exceeds MaxToleranceComponentCnt. If so,schedule a merge all of the current candidates into a new singlecomponent.Also, the prefix merge policy doesn't allow concurrent mergeoperations for a single index partition.In other words, if there is a scheduled or an on-going mergeoperation, even if the above conditions are met, the merge operationis not scheduled.
Based on this merge policy, the following situation can occur.
Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed todisk.When 5th disk component is flushed, the prefix merge policy schedulesa merge operation to merge the 5 components.During the merge operation is scheduled and starts merging,concurrently ingested records generates more disk components.As long as a merge operation is not fast enough to catch up the speedof generating 5 disk components by incoming ingested records,
the number of disk components increases as time goes.
So, the slower merge operations are, the more disk components therewill be as time goes.
I also attached a result of a command, "ls -alR <directory of theasterixdb instance for an ingestion experiment>" which was executedafter the ingestion is over.The attached file shows that for primary index (whose directory isFsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 diskcomponents, where each disk component consists of btree (the filenamehas suffix _b) and bloom filter (the filename has suffix_f) andMaxMergableComponentSize is set to 1GB.It also shows that for the secondary index (whose directory isFsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated morethan 1400 components, where each disk component consist of adictionary btree (suffix: _b), an inverted list (suffix: _i), adeleted-key btree (suffix: _d), and a bloom filter for the deleted-keybtree (suffix: _f).Even if the ingestion was over, since our merge operation happensasynchronously, the merge operation continues and eventually merge allmergable disk components according to the describe merge policy.
------------------------------------------
AQLs for the ingestion experiment
------------------------------------------
drop dataverse STBench if exists;
create dataverse STBench;
use dataverse STBench;

create type FsqCheckinTweetType as closed {
    id: int64,
    user_id: int64,
    user_followers_count: int64,
    text: string,
    datetime: datetime,
    coordinates: point,
    url: string?
}
create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
/* this index type is only available kisskys/hilbertbtree branch.however, you can easily replace sif index to inverted keyword index onthe text field and you will see similar behavior */create index sifCoordinate on FsqCheckinTweet(coordinates) typesif(-180.0, -90.0, 180.0, 90.0);
/* create feed */
create feed  TweetFeed
using file_feed
(("fs"="localfs"),
("path"="127.0.0.1:////Users/kisskys/Data/SynFsqCheckinTweet.adm"),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));
/* connect feed */
use dataverse STBench;
set wait-for-completion-feed "true";
connect feed TweetFeed to dataset FsqCheckinTweet;




Attachments:
    storage-layout.txt  574 KB

Re: Issue 868 in asterixdb: About prefix merge policy behavior

Reply via email to