Hi Simran, Thank's a lot for answering my questions.
 I'm sorry I'have been slow to answer back.

There was a copy/paste mistake in the Gist. That's why there was an
undefined gfi_s. The Gist has been update to reflect better the query.
https://gist.github.com/Waateur/6c8c0b2c40d0dfa08cecfe780275cd9f

The collect bellow  usually return under 5  results
  COLLECT code=b.code, category_id=b.category_id INTO code_group

You are right about this filter   FILTER LENGTH(a) == 1 is used to remove b
documents that have no match in a.

About the data distribution this is what i have in mind
A contain 31,025,557 docs and B  20,785,230 doc.
90% of A is link to one or more doc in B
60% of A is link to more than one B. ( mostly 3 and almost never over 5 )
20% of B is link to more than A. ( usually 2 or 3 )
These kind of duplicate are differentiated by the delivery string which is
a "date of upload" information.

As for example of documents, they were added to the gist.
I don't know if any other information can be useful.


Best,
Killian

Le lun. 12 nov. 2018 à 15:03, Simran Brucherseifer <sim...@arangodb.com> a
écrit :

> Hi Killian,
>
> it would be helpful to see your exact index definitions, some example
> documents and to know a bit about the data distribution.
>
> How many documents end up in one group here on average?
>
>  COLLECT code=b.code, category_id=b.category_id INTO code_group
>
>
> What do you actually want to return here? "gfi_s" is not defined in the
> Gist:
>
> FOR a_tmp IN A
> ...
> RETURN gfi_s
>
>
> What is this for? The sub-query is limited to one result anyway. Is this
> for the case of no match or a missing attribute (gfi_s, see above)?
>
> FILTER LENGTH(a) == 1
>
> Best, Simran
>
> --
> You received this message because you are subscribed to the Google Groups
> "ArangoDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to arangodb+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

Killian Janod
Datascientist @  iSmart / Kware
killian.ja...@kware.fr
killian.ja...@ismart.fr <killian.ja...@kware.fr>
+33 (0) 6 61 33 34 76

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to arangodb+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to