I believe we already support upsert feeds ;) https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql> > On May 8, 2017, at 12:04, Jianfeng Jia <[email protected]> wrote: > > I also observe this getting slower problem every-time when we re-ingest the > twitter data. One difference is that the duplicate key could happen, and we > know that is indeed duplicate record. To skip the search, we would expect an > “upsert” logic ( just replace the old one :-) ) instead of an insert. > > Then maybe we can add some configuration in feed configuration like > > create feed MessageFeed using localfs( > ("format"="adm"), > ("type-name"="typeX"), > ("upsert"="true") > ); > > to indicate that this feed using the upsert logic instead of insert. > > One thing we need to confirm is that if “upsert” is actually implemented in a > no-search fashion? > Based on the way we searching the components, only the most recent one will > be popped out. Then blindly insert should be OK logically. Correct me if I > missed some other cases (highly likely :-)). > > >> On May 8, 2017, at 11:05 AM, Mike Carey <[email protected]> wrote: >> >> +0.99 from me. >> >> >> On 5/8/17 9:50 AM, Taewoo Kim wrote: >>> +1 for auto-generated ID case >>> >>> Best, >>> Taewoo >>> >>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <[email protected]> wrote: >>> >>>> Abdullah has a pending change that disables searches if there's no >>>> secondary indexes [1]. >>>> Auto-generated ID could be another case for which we can disable searches >>>> as well. >>>> >>>> Best, >>>> Yingyi >>>> >>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>>> >>>> >>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <[email protected]> >>>> wrote: >>>> >>>>> Hi Devs, >>>>> >>>>> I'm noticing a behavior during the ingestion is that it's getting slower >>>> by >>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm >>>>> seeing is that I can notice the drop in ingestion rate roughly after >>>> having >>>>> 10 components (around ~13 GB). That's what I'm not sure if it's expected? >>>>> >>>>> I tried multiple setups (increasing Memory component size + >>>>> max-mergable-component-size). All of which delayed the problem but not >>>>> solved it. The only part I've never changed is the bloom-filter >>>>> false-positive rate (1%). Which I want to investigate next. >>>>> >>>>> So.. >>>>> What I want to suggest is that when the primary key is auto-generated, >>>> why >>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me. >>>> Also, >>>>> can we give the user the ability to tell the index that all keys are >>>> unique >>>>> ? I know I should not trust the user .. but in certain cases, probably >>>> the >>>>> user is certain that the key is unique. Or a more elegant solution can >>>>> shine in the end :-) >>>>> >>>>> -- >>>>> >>>>> *Regards,* >>>>> Wail Alkowaileet >>>>> >> >
Best regards, Ildar
