A general question is: if the speed of incoming records is higher than the merge and disk-flush speed, what should we do? Can we reject the insertion requests?
Chen On Sun, May 17, 2015 at 5:38 PM, Michael Carey <[email protected]> wrote: > Moving this to the incubator dev list (just looking it over again). Q - if > when a merge finishes there's a bigger backlog of components - will it > currently consider doing a more-ways merge? (Instead of 5, if there are 13 > sitting there when the 5-merge finishes - will a 13-merge be initiated?) > Just curious. We do probably need to think about some sort of better flow > control here - the storage gate should presumably slow down admissions if it > can't keep up - have to ponder what that might mean. (I have a better idea > of what it could mean for feeds than for normal inserts.) One could argue > that an increasing backlog is a sign that we should be scaling out the > number of partitions for the dataset (future work but important work :-)). > > Cheers, > Mike > > On 4/15/15 2:33 PM, [email protected] wrote: > > Status: New > Owner: ---- > Labels: Type-Defect Priority-Medium > > New issue 868 by [email protected]: About prefix merge policy behavior > https://code.google.com/p/asterixdb/issues/detail?id=868 > > I describe how current prefix merge policy works based on the observation > from ingestion experiments. > Also, the similar observation was observed by Sattam as well. > The observed behavior seems a bit unexpected, so I post the observation here > to consider better merge policy and/or better lsm index design regarding > merge operations. > > The aqls used for the experiment are shown at the end of this writing. > > Prefix merge policy decides to merge disk components based on the following > conditions > 1. Look at the candidate components for merging in oldest-first order. If > one exists, identify the prefix of the sequence of all such components for > which the sum of their sizes exceeds MaxMergableComponentSize. Schedule a > merge of those components into a new component. > 2. If a merge from 1 doesn't happen, see if the set of candidate components > for merging exceeds MaxToleranceComponentCnt. If so, schedule a merge all > of the current candidates into a new single component. > Also, the prefix merge policy doesn't allow concurrent merge operations for > a single index partition. > In other words, if there is a scheduled or an on-going merge operation, even > if the above conditions are met, the merge operation is not scheduled. > > Based on this merge policy, the following situation can occur. > Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to disk. > When 5th disk component is flushed, the prefix merge policy schedules a > merge operation to merge the 5 components. > During the merge operation is scheduled and starts merging, concurrently > ingested records generates more disk components. > As long as a merge operation is not fast enough to catch up the speed of > generating 5 disk components by incoming ingested records, > the number of disk components increases as time goes. > So, the slower merge operations are, the more disk components there will be > as time goes. > > I also attached a result of a command, "ls -alR <directory of the asterixdb > instance for an ingestion experiment>" which was executed after the > ingestion is over. > The attached file shows that for primary index (whose directory is > FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk > components, where each disk component consists of btree (the filename has > suffix _b) and bloom filter (the filename has suffix_f) and > MaxMergableComponentSize is set to 1GB. > It also shows that for the secondary index (whose directory is > FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more than > 1400 components, where each disk component consist of a dictionary btree > (suffix: _b), an inverted list (suffix: _i), a deleted-key btree (suffix: > _d), and a bloom filter for the deleted-key btree (suffix: _f). > Even if the ingestion was over, since our merge operation happens > asynchronously, the merge operation continues and eventually merge all > mergable disk components according to the describe merge policy. > > ------------------------------------------ > AQLs for the ingestion experiment > ------------------------------------------ > drop dataverse STBench if exists; > create dataverse STBench; > use dataverse STBench; > > create type FsqCheckinTweetType as closed { > id: int64, > user_id: int64, > user_followers_count: int64, > text: string, > datetime: datetime, > coordinates: point, > url: string? > } > create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id > > /* this index type is only available kisskys/hilbertbtree branch. however, > you can easily replace sif index to inverted keyword index on the text field > and you will see similar behavior */ > create index sifCoordinate on FsqCheckinTweet(coordinates) type sif(-180.0, > -90.0, 180.0, 90.0); > > /* create feed */ > create feed TweetFeed > using file_feed > (("fs"="localfs"), > ("path"="127.0.0.1:////Users/kisskys/Data/SynFsqCheckinTweet.adm"),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0")); > > /* connect feed */ > use dataverse STBench; > set wait-for-completion-feed "true"; > connect feed TweetFeed to dataset FsqCheckinTweet; > > > > > Attachments: > storage-layout.txt 574 KB > > > -- > You received this message because you are subscribed to the Google Groups > "asterixdb-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout.
