Hello! Quick heads up! The new streaming features completely solved my issue! As the dataset is big requesting a big part of it took lots of RAM. This means that I might have managed it LIMIT OFFSET but the query is slow and doing it multiple time was really slow. So using streaming with 3.4 saved the day. Nice job arangodb & the team! This software is really awesome and every new feature are well made!
Le ven. 16 nov. 2018 à 11:02, Killian Janod <killian.ja...@ismart.fr> a écrit : > Hi Simran, Thank's a lot for answering my questions. > I'm sorry I'have been slow to answer back. > > There was a copy/paste mistake in the Gist. That's why there was an > undefined gfi_s. The Gist has been update to reflect better the query. > https://gist.github.com/Waateur/6c8c0b2c40d0dfa08cecfe780275cd9f > > The collect bellow usually return under 5 results > COLLECT code=b.code, category_id=b.category_id INTO code_group > > You are right about this filter FILTER LENGTH(a) == 1 is used to remove > b documents that have no match in a. > > About the data distribution this is what i have in mind > A contain 31,025,557 docs and B 20,785,230 doc. > 90% of A is link to one or more doc in B > 60% of A is link to more than one B. ( mostly 3 and almost never over 5 ) > 20% of B is link to more than A. ( usually 2 or 3 ) > These kind of duplicate are differentiated by the delivery string which > is a "date of upload" information. > > As for example of documents, they were added to the gist. > I don't know if any other information can be useful. > > > Best, > Killian > > Le lun. 12 nov. 2018 à 15:03, Simran Brucherseifer <sim...@arangodb.com> > a écrit : > >> Hi Killian, >> >> it would be helpful to see your exact index definitions, some example >> documents and to know a bit about the data distribution. >> >> How many documents end up in one group here on average? >> >> COLLECT code=b.code, category_id=b.category_id INTO code_group >> >> >> What do you actually want to return here? "gfi_s" is not defined in the >> Gist: >> >> FOR a_tmp IN A >> ... >> RETURN gfi_s >> >> >> What is this for? The sub-query is limited to one result anyway. Is this >> for the case of no match or a missing attribute (gfi_s, see above)? >> >> FILTER LENGTH(a) == 1 >> >> Best, Simran >> >> -- >> You received this message because you are subscribed to the Google Groups >> "ArangoDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to arangodb+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > Killian Janod > Datascientist @ iSmart / Kware > killian.ja...@kware.fr > killian.ja...@ismart.fr <killian.ja...@kware.fr> > +33 (0) 6 61 33 34 76 > -- Killian Janod Datascientist @ iSmart / Kware killian.ja...@kware.fr killian.ja...@ismart.fr <killian.ja...@kware.fr> +33 (0) 6 61 33 34 76 -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to arangodb+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.