Actually Abdullah’s patch is going to lift this requirement also (given that the are no secondaries).
> On May 8, 2017, at 12:39, Mike Carey <[email protected]> wrote: > > Note that upserts don't avoid searches.... (Still need to get the old record > to update secondary indexes from.) > > > On 5/8/17 12:10 PM, Jianfeng Jia wrote: >> Aha, never knew that before. We will definitely try upsert feed next time! >> Thanks for pointing it out! >> >>> On May 8, 2017, at 12:07 PM, Ildar Absalyamov <[email protected]> >>> wrote: >>> >>> I believe we already support upsert feeds ;) >>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql >>> >>> <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql> >>>> On May 8, 2017, at 12:04, Jianfeng Jia <[email protected]> wrote: >>>> >>>> I also observe this getting slower problem every-time when we re-ingest >>>> the twitter data. One difference is that the duplicate key could happen, >>>> and we know that is indeed duplicate record. To skip the search, we would >>>> expect an “upsert” logic ( just replace the old one :-) ) instead of an >>>> insert. >>>> >>>> Then maybe we can add some configuration in feed configuration like >>>> >>>> create feed MessageFeed using localfs( >>>> ("format"="adm"), >>>> ("type-name"="typeX"), >>>> ("upsert"="true") >>>> ); >>>> >>>> to indicate that this feed using the upsert logic instead of insert. >>>> >>>> One thing we need to confirm is that if “upsert” is actually implemented >>>> in a no-search fashion? >>>> Based on the way we searching the components, only the most recent one >>>> will be popped out. Then blindly insert should be OK logically. Correct me >>>> if I missed some other cases (highly likely :-)). >>>> >>>> >>>>> On May 8, 2017, at 11:05 AM, Mike Carey <[email protected]> wrote: >>>>> >>>>> +0.99 from me. >>>>> >>>>> >>>>> On 5/8/17 9:50 AM, Taewoo Kim wrote: >>>>>> +1 for auto-generated ID case >>>>>> >>>>>> Best, >>>>>> Taewoo >>>>>> >>>>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <[email protected]> wrote: >>>>>> >>>>>>> Abdullah has a pending change that disables searches if there's no >>>>>>> secondary indexes [1]. >>>>>>> Auto-generated ID could be another case for which we can disable >>>>>>> searches >>>>>>> as well. >>>>>>> >>>>>>> Best, >>>>>>> Yingyi >>>>>>> >>>>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>>>>>> >>>>>>> >>>>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Devs, >>>>>>>> >>>>>>>> I'm noticing a behavior during the ingestion is that it's getting >>>>>>>> slower >>>>>>> by >>>>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm >>>>>>>> seeing is that I can notice the drop in ingestion rate roughly after >>>>>>> having >>>>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's >>>>>>>> expected? >>>>>>>> >>>>>>>> I tried multiple setups (increasing Memory component size + >>>>>>>> max-mergable-component-size). All of which delayed the problem but not >>>>>>>> solved it. The only part I've never changed is the bloom-filter >>>>>>>> false-positive rate (1%). Which I want to investigate next. >>>>>>>> >>>>>>>> So.. >>>>>>>> What I want to suggest is that when the primary key is auto-generated, >>>>>>> why >>>>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me. >>>>>>> Also, >>>>>>>> can we give the user the ability to tell the index that all keys are >>>>>>> unique >>>>>>>> ? I know I should not trust the user .. but in certain cases, probably >>>>>>> the >>>>>>>> user is certain that the key is unique. Or a more elegant solution can >>>>>>>> shine in the end :-) >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Regards,* >>>>>>>> Wail Alkowaileet >>>>>>>> >>> Best regards, >>> Ildar >>> > Best regards, Ildar
