Added some more details on collections recipe for List CTR
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/4eb80bd4 Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/4eb80bd4 Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/4eb80bd4 Branch: refs/heads/TINKERPOP-1857 Commit: 4eb80bd4878d455bff761c091b8fcd8414d59660 Parents: 586e432 Author: Stephen Mallette <[email protected]> Authored: Tue Jan 16 14:31:55 2018 -0500 Committer: Stephen Mallette <[email protected]> Committed: Tue Jan 16 14:31:55 2018 -0500 ---------------------------------------------------------------------- docs/src/recipes/collections.asciidoc | 282 ++++++++++++++++++++++++++++- 1 file changed, 279 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/4eb80bd4/docs/src/recipes/collections.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/collections.asciidoc b/docs/src/recipes/collections.asciidoc index 446ed46..5db1e6b 100644 --- a/docs/src/recipes/collections.asciidoc +++ b/docs/src/recipes/collections.asciidoc @@ -21,9 +21,237 @@ image:gremlin-collections.png[width=400] Lists and maps form the basis for much of the processing in Gremlin traversals. They are core to how side-effects are typically held and how results are generally produced. Being able to pick them apart and reformat them is sometimes -required. +required. This need to shape the date within a traversal may arise both at the +link:http://tinkerpop.apache.org/docs/current/reference/#terminal-steps[terminal step] of the traversal (technically +just prior to the terminal step) or in the middle of a traversal. Considering the former, a transformation just prior +to iteration will get the result into the form required by the application which would remove the need for additional +application level manipulation. Moreover, a transformation at this stage may reduce the size of the payload being +returned which could be useful in remote applications. Examining the latter, there may be times where a `List` or `Map` +requires some mid-traversal transformation so as to continue with the general logic of the traversal itself. For +example, a traversal at some point might produce a `Map` of `List` objects where the lists contain vertices, where each +`List` might need to be sorted by some criteria and then the top item for each extracted to become the basis for the +continued traversal. Executing transformations for either of these types of situations can be made possible with the +patterns described in this section. -One of the most common ways to end up with a `Map` is with `valueMap()`: +The appearance of a `List` as a traverser in Gremlin usually arises as a result of a `fold()` operation, but may also +appear by way of some side-effect steps like `store()`: + +[gremlin-groovy,modern] +---- +g.V().fold() +g.V().store('a').cap('a') +---- + +It is worth noting that while a `Path` is not technically a `List` it does present like one and can be manipulated in +similar fashion to lists: + +[gremlin-groovy,modern] +---- +g.V().out().out().path() +---- + +These examples are obviously trivial and there are other ways that a traverser might end up in a `List` form, but, at +this moment, the point here is to focus less on how to get a `List` and more on how to manipulate one within the +Gremlin language. The examples going forward will also be similarly contrived insofar as producing a usable `List` to +manipulate. Bear in mind that it may be quite possible to get the same end results of these examples using more direct +means than what is demonstrated. + +It may seem simple, but the most obvious choice to modifying what is in a list is to simply `unfold()` the `List`: + +[gremlin-groovy,modern] +---- +g.V().fold().unfold().values('name') +g.V().store('a').cap('a').unfold().values('name') +---- + +The above examples show that `unfold()` works quite well when you want don't want to preserve the `List` structure of +the traverser as it just flattens `List` traversers to the traversal stream. The above examples only have one `List` +as a result, but consider what happens when there is more than one: + +[gremlin-groovy,modern] +---- +g.V().union(fold(),fold()) +g.V().union(fold(),fold()).unfold().values('name') +---- + +The two separate `List` traversers are flattened to a single traversal stream and all the results are mixed together. +While this approach may be acceptable, there are many cases where it might not be so. To preserve the individual +structure of the `List` traversers "locally" `unfold()` the lists to transform them: + +[gremlin-groovy,modern] +---- +g.V(). + union(fold(),fold()). + local(unfold().values('name').fold()) +---- + +The call to `local()` executes its anonymous sub-traversal over each individual `List` iterator and as the +sub-traversal ends with a `fold()` step, the results are reduced back into a `List` to preserve the original structure, +thus maintaining two traverser results. + +This pattern for unfolding and folding `List` traversers ends up having other applications: + +[gremlin-groovy,modern] +---- +g.V().union(limit(3).fold(),tail(3).fold()) <1> +g.V().union(limit(3).fold(),tail(3).fold()). + local(unfold(). <2> + order(). + by(bothE().count(),decr). + limit(1). + fold()) +g.V().union(limit(3).fold(),tail(3).fold()). <3> + local(unfold(). + has('age',gte(29)). + values('age'). + mean()) +---- + +<1> The output consists of two `List` traversers. +<2> For each `List` of vertices, order them by their number of edges, and choose the first one which will be the one +with the highest degree (i.e. number of edges). By ending with `fold()` the `List` traverser structure is preserved +thus returning two `List` objects. Consider this a method for choosing a "max" or a highly ranked vertex. In this case +the rank was determined by the number of edges, but it could have just as easily been determined by a vertex property, +edge property, a calculated value, etc. - one simply needs to alter the `by()` step modulator to `order()`. +<3> For each `List` of vertices, filter that `List` to only include vertices that have an "age" property with a +value greater than or equal to "29" and then average the results of each list. More generally, consider how this +approach performs some kind of reducing calculation on each `List` traverser. In this case, an average was calculated, +but it might also have been a `sum()`, `count()` or similar operation that reduced the list to a single calculated +value. + +So far, this section has focused on what to do with a `List` traverser once there is one present and there have been +fairly contrived examples for how to produce one in the first place. The use of `fold()` has been used most frequently +at this point to achieve list creation and that step should be recalled whenever there is a need to reduce some +traversal stream to an actual `List`. Of course, it may become necessary to more manually construct a `List`, +especially in cases where the expected output of the traversal is composed of one or more ordered results in the +form of a `List`. For example, consider the following three traversals: + +[gremlin-groovy,modern] +---- +g.V().has('name','marko').values('age') <1> +g.V().has('name','marko'). <2> + repeat(out()). + until(has('lang','java')). + path(). + by('name') +g.V().has('name','marko'). <3> + repeat(outE().inV()). + until(has('lang','java')). + path(). + local(unfold(). + has('weight'). + values('weight'). + mean()) +---- + +<1> Get the age of "marko" +<2> get the "name" values of the vertices in the collected paths that traverse out from "marko" to any vertex with +the "lang" of "java". +<3> Get the average of the "weight" values of edges in the collected paths that traverse out from "marko" to any vertex +with the "lang" of "java". Note the use of the earlier defined pattern that used `local()` in conjunction with +`unfold()`. In this case it filters out vertices from the `Path` as they are not relevant as the concern is only with +the "weight" property on the edges. + +For purposes of this example, the three traversals above happen to represent three pieces of data that are required by +an application. It is plain to note that all of the above traversals hold a similar pattern that starts with +"getting 'marko'" and, in the case of the latter two, traversing on outgoing edges away from him and collecting data +from that path. Ideally, all three of these traversals should execute as one to prevent having to submit three separate +traversals, thus incurring additional query execution costs for what amounts to be largely the same underlying data but +with different transformations applied. The goal here would be to return the results of this data as a `List` with +three results (i.e. triple) that could then be submitted once by the application. The following example demonstrates +the use of `store()` to aid in construction of this `List`: + +[gremlin-groovy,modern] +---- +g.V().has('name','marko').as('v'). <1> + store('a'). <2> + by('age'). + repeat(outE().as('e').inV().as('v')). <3> + until(has('lang','java')). + aggregate('b'). <4> + by(select(all,'v').unfold().values('name').fold()). + aggregate('c'). <5> + by(select(all,'e').unfold().values('weight').mean()). + fold(). <6> + store('a'). <7> + by(cap('b')). + store('a'). <8> + by(cap('c')). + cap('a') +---- + +<1> Get the "marko" vertex and label that step as "v". +<2> Store the first "age" of "marko" as the first item in the `List` called "a", which will ultimately be the result. +<3> Execute the traversal away from "marko" and continue to traverse on outgoing edges until the vertex has the value +of "java" for the "lang" property. Note the labels of "e" and "v". Note that "e" will contain a `List` of all of the +edges that have been traversed and "v" will contain a `List` of all the vertices that have been traversed. +<4> The incoming traverser to `aggregate('b')` are vertices that terminate the `repeat()` (i.e. those with the "lang" +of "java"). Note however that the `by()` modulator overrides that traverser completely by starting a fresh stream of +the list of vertices in "v". Those vertices are unfolded to retrieve the name property from each and then are reduced +with `fold()` back into a list to be stored in the side-effected named "b". +<5> A similar use of `aggregate()` as the previous step, though this one turns "e" into a stream of edges to calculate +the `mean()` to store in a `List` called "c". Note that `aggregate()` was used here instead of `store()`, as +the former is an eager collection of the elements in the stream (`store()` is lazy) and will force the traversal to be +iterated up to that point before moving forward. Without that eager collection, "v" and "e" would not contain the +complete information required for the production of "b" and "c". +<6> Adding `fold()` step here is a bit of a trick. To see the trick, copy and paste all lines of Gremlin up to but +not including this `fold()` step and run them against the "modern" graph. The output is three vertices and if the +`profile()` step was added one would also see that the traversal contained three traversers. These three traversers +with a vertex in each one were produced from the `repeat()` step (i.e. those vertices that had the "lang" of "java" +when traversing away from "marko"). The `aggregate()` steps are side-effects and just allow the traversers to pass +through them unchanged. The `fold()` obviously converts those three traversers to a single `List` to make one +traverser with a `List` inside. That means that the remaining steps following the `fold()` will only be executed one +time each instead of three, which, as will be shown, is critical to the proper result. +<7> The single traverser with the `List` of three vertices in it passes to `store()`. The `by()` modulator presents an +override telling Gremlin to ignore the `List` of three vertices and simply grab the "b" side effect created earlier and +stick that into "a" as part of the result. The `List` with three vertices passes out unchanged as `store()` is a +side-effect step. +<8> Again, the single traverser with the `List` of three vertices passes to `store()` and again, the `by()` modulator +presents an override to include "c" into the result. + +All of the above code and explanation show that `store()` can be used to construct `List` objects as side-effects +which can then be used as a result. Note that `aggregate()` can be used to similar effect, should it make sense that +lazy `List` creation is not acceptable with respect to the nature of the traversal. An interesting sub-pattern that +emerges here is that the `by()` step can modulate its step to completely override the current traverser and ignore its +contents for purpose of that step. This ability to override a traverser acts as a powerful and flexible tool as it +means that each traverser can effectively become a completely different object as determined by a sub-traversal. + +Another interesting method for `List` creation was demonstrated a bit earlier but not examined in detail - the use of +`union()`. It was shown earlier in the following context where it helped create a `List` of two lists of three +vertices each: + +[gremlin-groovy,modern] +---- +g.V().union(limit(3).fold(),tail(3).fold()) +---- + +By folding the results of `union()`, it becomes possible to essentially construct lists with arbitrary traversal +results. + +[gremlin-groovy,modern] +---- +g.V(). + local(union(identity(), <1> + bothE().count()). + fold()) +g.V(). + store('x'). + by(union(select('x').count(local), <2> + identity(), + bothE().count()). + fold()). + cap('x') +---- + +<1> For each vertex, create a "pair" (i.e. a `List` of two objects) of the vertex itself and its edge count. +<2> For each vertex, create a "triple" (i.e. a `List`of three objects) of the index of the vertex (starting at zero), +the vertex itself and its edge count. + +The pattern here is to use `union()` in conjunction with `fold()`. As explained earlier, the `fold()` operation reduces +the stream from `union()` to a single `List` that is then fed forward to the next step in the traversal. + +Now that `List` patterns have been explained, there can now be some attention on `Map`. One of the most common ways +to end up with a `Map` is with `valueMap()`: [gremlin-groovy,modern] ---- @@ -64,4 +292,52 @@ on that sub-traversal (it does not iterate the entire thing). Generally speaking, a `Map` constructed as part of `group()` or `project()` will already be in the form required as the `by()` modulators would be written in such a fashion as to produce that final output. It would be unnecessary to deconstruct/reconstruct it. Be certain that there isn't a way to re-write the `group()` or `project()` to get the -desired output before taking this approach. \ No newline at end of file +desired output before taking this approach. + +In the following case, `project()` is used to create a `Map` that does not meet this requirement as it contains some +unavoidable extraneous keys in the output `Map`: + +[gremlin-groovy,modern] +---- +g.V(). + project('name','age','lang'). + by('name'). + by(coalesce(values('age'),constant('n/a'))). + by(coalesce(values('lang'),constant('n/a'))) +---- + +The use of `coalesce()` works around the problem where "age" and "lang" are not necessarily property keys present on +every single vertex in the traversal stream. When the "age" or "lang" are not present, the constant of "n/a" is +supplied. While this may be an acceptable output, it is possible to shape the `Map` to be "nicer": + +[gremlin-groovy,modern] +---- +g.V(). + project('name','age','lang'). + by('name'). + by(coalesce(values('age'),constant('n/a'))). + by(coalesce(values('lang'),constant('n/a'))). + local(unfold(). + filter(select(values).is(P.neq('n/a'))). + group(). + by(keys). + by(values)) +---- + +The addition steps above `unfold()` the `Map` to key-value entries and filter the values for "n/a" and remove them +prior to reconstructing the `Map` with the method shown earlier. To go a step further, apply the pattern presented +earlier to flatten `List` values within a `Map`: + +[gremlin-groovy,modern] +---- +g.V(). + project('name','age','lang'). + by('name'). + by(coalesce(values('age'),constant('n/a'))). + by(coalesce(values('lang'),constant('n/a'))). + local(unfold(). + filter(select(values).is(P.neq('n/a'))). + group(). + by(keys). + by(select(values).unfold())) +---- \ No newline at end of file
