Hi Nigel,

did you define any indexes? For movies.csv, it would probably make sense to 
use this as _key. The primary index is over this attribute, thus sparing 
you an additional index.

Another way would be to assume that tags are written in the same case or 
update them to be all lowercase for instance, then get rid of the function 
calls to LOWER() and create an index on movieId,tag in collection TagLinks.
The index can then be utilized for the following filter condition:

filter m.movieId == tl.movieId And t.tag == tl.tag

Be sure to have a look at the execution plan to see what will be done:

Execution plan:
 Id   NodeType                       Est.   Comment
  1   SingletonNode                     1   * ROOT
  3   EnumerateCollectionNode       10000     - FOR m IN Movies   /* full 
collection scan */
  4   EnumerateCollectionNode   100000000       - FOR t IN Tags   /* full 
collection scan */
  7   CalculationNode           100000000         - LET #6 = { "_from" : 
m.`_id`, "_to" : t.`_id` }   /* simple expression */   /* collections used: 
m : Movies, t : Tags */
  9   IndexNode                 100000000         - FOR tl IN TagLinks   /* 
hash index scan, scan only */
  8   InsertNode                        0           - INSERT #6 IN MyTags


Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields           
      Ranges
  9   hash   TagLinks     false    false        91.46 %   [ `movieId`, `tag` 
]   ((m.`movieId` == tl.`movieId`) && (t.`tag` == tl.`tag`))


-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to