COLLECT clears all variables in the current scope. Click on Explain the web interface and inspect the execution plan:
<https://lh3.googleusercontent.com/-0P4-5oHCxII/We9inq0LnlI/AAAAAAAAAC4/hwHz7nP744kwImjRWLugN98A9z6U4jhyQCL4CGAYYCw/s1600/movie_ratings_query_plan.png> Follow the CollectNode with Id 7 all the way up to the next SingletonNode - it's Id 3 and the ROOT of the current scope. The variables emitted by the graph traversal (m, e) and the variable for the genre iteration (g) are in the scope, which means you don't have access to them after COLLECT. The parent scope (which is the top-level scope) contains the iteration over all user documents. Variable u can be still be accessed after COLLECT, because it is defined outside its scope. If you want to know into which "buckets" values are grouped, use the COLLECT ... INTO syntax. https://docs.arangodb.com/3.2/AQL/Operations/Collect.html The COLLECT ... WITH COUNT INTO ... syntax is a shorthand if you want to group and count the number of occurrences (how many items per bucket if you will). This syntax can not be extended by an INTO clause however. We still need the counts nonetheless, so we need to rework the query a bit. We could use the standard INTO syntax, but it would keep way too much data which we don't need further down the query. All we actually need is the rating stored as edge attribute. Thus, we can create a projection like so: COLLECT ... INTO r = e.rating For every bucket (genre), we will have access to an array with the rating values via variable r. We had to remove the counting, and need to add it back in a different way now. There are two options. Post-calculation: COLLECT genre = g INTO r = e.rating RETURN LENGTH(r) // array length of ratings equals number of items in bucket (what if there's no rating attribute though?) Aggregation (can be more efficient, although it shouldn't make any difference in your case): COLLECT genre = g AGGREGATE count = LENGTH(1) INTO r = e.rating For every item in a bucket, a counter is increased by one (the LENGTH function always returns 1 in conjunction with AGGREGATE, no matter what you pass to it). AGGREGATE could also be used to find out the minimum and maximum values as well as a few other statistical metrics, but it's not needed in this context. The full query: FOR u IN users LET genreStats = MERGE( FOR m, e IN OUTBOUND u GRAPH 'ratedGraph' // get all movies a user is linked to OPTIONS {uniqueVertices: 'global', bfs: true} // ignore duplicate movies FOR g IN m.genre COLLECT genre = g AGGREGATE count = LENGTH(1) INTO r = e.rating // group by genre RETURN {[genre]: count * AVERAGE(r)} // return one object per genre (merged into single object by MERGE function in 2nd line) ) FILTER LENGTH(genreStats) // don't update user documents which are not linked to any movie UPDATE u WITH {genreStats} IN users RETURN NEW count is multiplied by the AVERAGE (mean) of the ratings per genre. There are also functions like MEDIAN which could be used instead: https://docs.arangodb.com/3.2/AQL/Functions/Numeric.html -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
