Aha, never knew that before. We will definitely try upsert feed next time! Thanks for pointing it out!
> On May 8, 2017, at 12:07 PM, Ildar Absalyamov <[email protected]> > wrote: > > I believe we already support upsert feeds ;) > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql > > <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql> >> On May 8, 2017, at 12:04, Jianfeng Jia <[email protected]> wrote: >> >> I also observe this getting slower problem every-time when we re-ingest the >> twitter data. One difference is that the duplicate key could happen, and we >> know that is indeed duplicate record. To skip the search, we would expect an >> “upsert” logic ( just replace the old one :-) ) instead of an insert. >> >> Then maybe we can add some configuration in feed configuration like >> >> create feed MessageFeed using localfs( >> ("format"="adm"), >> ("type-name"="typeX"), >> ("upsert"="true") >> ); >> >> to indicate that this feed using the upsert logic instead of insert. >> >> One thing we need to confirm is that if “upsert” is actually implemented in >> a no-search fashion? >> Based on the way we searching the components, only the most recent one will >> be popped out. Then blindly insert should be OK logically. Correct me if I >> missed some other cases (highly likely :-)). >> >> >>> On May 8, 2017, at 11:05 AM, Mike Carey <[email protected]> wrote: >>> >>> +0.99 from me. >>> >>> >>> On 5/8/17 9:50 AM, Taewoo Kim wrote: >>>> +1 for auto-generated ID case >>>> >>>> Best, >>>> Taewoo >>>> >>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <[email protected]> wrote: >>>> >>>>> Abdullah has a pending change that disables searches if there's no >>>>> secondary indexes [1]. >>>>> Auto-generated ID could be another case for which we can disable searches >>>>> as well. >>>>> >>>>> Best, >>>>> Yingyi >>>>> >>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>>>> >>>>> >>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> I'm noticing a behavior during the ingestion is that it's getting slower >>>>> by >>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm >>>>>> seeing is that I can notice the drop in ingestion rate roughly after >>>>> having >>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's expected? >>>>>> >>>>>> I tried multiple setups (increasing Memory component size + >>>>>> max-mergable-component-size). All of which delayed the problem but not >>>>>> solved it. The only part I've never changed is the bloom-filter >>>>>> false-positive rate (1%). Which I want to investigate next. >>>>>> >>>>>> So.. >>>>>> What I want to suggest is that when the primary key is auto-generated, >>>>> why >>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me. >>>>> Also, >>>>>> can we give the user the ability to tell the index that all keys are >>>>> unique >>>>>> ? I know I should not trust the user .. but in certain cases, probably >>>>> the >>>>>> user is certain that the key is unique. Or a more elegant solution can >>>>>> shine in the end :-) >>>>>> >>>>>> -- >>>>>> >>>>>> *Regards,* >>>>>> Wail Alkowaileet >>>>>> >>> >> > > Best regards, > Ildar >
