Repository: tinkerpop
Updated Branches:
  refs/heads/master b36b42f6c -> e790e56aa


Add "centrality" recipes. CTR


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/e790e56a
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/e790e56a
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/e790e56a

Branch: refs/heads/master
Commit: e790e56aaa2118b07e36f2579502786bd9d79cc0
Parents: b36b42f
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Jun 10 16:50:35 2016 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Fri Jun 10 16:50:35 2016 -0400

----------------------------------------------------------------------
 docs/src/recipes/centrality.asciidoc      | 138 +++++++++++++++++++++++++
 docs/src/recipes/index.asciidoc           |   2 +
 docs/static/images/betweeness-example.png | Bin 0 -> 8465 bytes
 3 files changed, 140 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/src/recipes/centrality.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/centrality.asciidoc 
b/docs/src/recipes/centrality.asciidoc
new file mode 100644
index 0000000..cb3ce12
--- /dev/null
+++ b/docs/src/recipes/centrality.asciidoc
@@ -0,0 +1,138 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[centrality]]
+Centrality
+----------
+
+There are many measures of 
link:https://en.wikipedia.org/wiki/Centrality[centrality] which are meant to 
help identify
+the most important vertices in a graph. As these measures are common in graph 
theory, this section attempts to
+demonstrate how some of these different indicators can be calculated using 
Gremlin.
+
+[[degree-centrality]]
+Degree Centrality
+~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Centrality#Degree_centrality[Degree 
centrality] is a measure of the number of
+edges associated to each vertex.
+
+[gremlin-groovy,modern]
+----
+g.V().bothE().group().by(otherV()).by(count())      <1>
+g.V().inE().group().by(inV()).by(count())           <2>
+g.V().outE().group().by(outV()).by(count())         <3>
+g.V().group().by().by(outE().count())               <4>
+----
+
+<1> Calculation of degree centrality which counts all incident edges on each 
vertex to include those that are both
+incoming and outgoing.
+<2> Calculation of in-degree centrality which only counts incoming edges to a 
vertex.
+<3> Calculation of out-degree centrality which only counts outgoing edges from 
a vertex.
+<4> Same calculation as the previous traversal, but includes all vertices, 
even those with a zero
+
+[[betweeness-centrality]]
+Betweeness Centrality
+~~~~~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Betweenness_centrality[Betweeness 
centrality] is a measure of the number of times
+a vertex is found between the <<shortest-path,shortest path>> of each vertex 
pair in a graph.  Consider the following
+graph for demonstration purposes:
+
+image:betweeness-example.png[width=600]
+
+[gremlin-groovy ]
+----
+a = graph.addVertex('name','a')
+b = graph.addVertex('name','b')
+c = graph.addVertex('name','c')
+d = graph.addVertex('name','d')
+e = graph.addVertex('name','e')
+a.addEdge('next',b)
+b.addEdge('next',c)
+c.addEdge('next',d)
+d.addEdge('next',e)
+g.withSack(0).V().repeat(both().simplePath()).emit().path().        <1>
+  group().by(project("a","b").by(limit(local, 1)).                  <2>
+                              by(tail(local, 1))).
+          by(order().by(count(local))).                             <3>
+          select(values).as("shortestPaths").                       <4>
+          V().as("v").map(select("shortestPaths").unfold().         <5>
+                          filter(unfold().where(eq("v"))).count()).
+              sack(sum).barrier(normSack).sack().as("betweeness").  <6>
+          select("v","betweeness")
+----
+
+<1> Defines a Gremlin 
link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sack-step[sack] with a 
value of zero,
+which represents the initial betweeness score for each vertex, and traverses 
on both incoming and outgoing edges
+avoiding <<cycle-detection, cyclic paths>>.
+<2> Group each path by the first and last vertex.
+<3> Reduce the list of paths to the shortest path between the first and last 
vertex by ordering on their lengths.
+<4> Recall that at this point, there is a `Map` keyed by first and last vertex 
and with a value of just the shortest
+path. Extract the shortest path with `select(values)`, since that's the only 
portion required for the remainder of
+the traversal.
+<5> Begin a mid-traversal iteration of all vertices to essentially count the 
shortest paths that pass through each one.
+<6> Sum the counts for each vertex using `sack()`, normalize the value and 
label it as the "betweeness" to be the score.
+
+[[closeness-centrality]]
+Closeness Centrality
+~~~~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Centrality[Closeness centrality] is a 
measure of the distance of one vertex to all
+other reachable vertices in the graph.
+
+[gremlin-groovy,modern]
+----
+g.V().as("a").repeat(both().simplePath()).emit().as("b").
+  dedup().by(select("a","b")).
+  path().
+  group().by(limit(local, 1)).
+          by(count(local).map {1/it.get()}.sum())
+----
+
+The traversal starts by iterating over all vertices to find all paths that 
don't <<cycle-detection,cycle>>. It labels
+the start vertex as "a" and the ending vertex being emitted as "b". To perform 
this calculation, the measurement
+only requires the <<shortest-path,shortest path>> between each vertex pair, so 
the first occurrence of "a" and "b"
+as denoted in the `dedup().by(select("a","b"))` will be that shortest distance.
+
+To this point, the traversal has the unique list of shortest paths between 
each vertex pair and it is time to
+operate on that `path()`. To get the output displaying each start vertex with 
the centrality calculation, the `Path`
+objects can be grouped. In this case, `group()` takes two `by()` modulators. 
The first `by()` specifies the key for
+grouping and simply extracts the first vertex in the `Path` with `limit(local, 
1)`. The second `by()` produces the
+value for the grouping and in this case:
+
+. `(count(local)` - counts the length of each path to produce the __distance__
+. `map {1/it.get()}` - uses a closure to divide the __distance__ by "1"
+. `sum()` - sums those values to produce the centrality score for that vertex
+
+[[eigenvector-centrality]]
+Eigenvector Centrality
+~~~~~~~~~~~~~~~~~~~~~~
+
+A calculation of 
link:https://en.wikipedia.org/wiki/Centrality#Eigenvector_centrality[eigenvector
 centrality] uses the
+relative importance of adjacent vertices to help determine their centrality. 
In other words, unlike
+<<degree-centrality, degree centrality>> the vertex with the greatest number 
of incident edges does not necessarily
+give it the highest rank.
+
+[gremlin-groovy,modern]
+----
+g.V().repeat(groupCount("m").out()).times(30).cap('m')
+----
+
+The traversal iterates through each vertex in the graph and for each one 
repeatedly group counts each vertex that
+passes through using the vertex as the key. The `Map` of this group count is 
stored in a variable named "m". The
+`out()` traversal is repeated thirty times or until the paths are exhausted. 
Thirty iterations should provide enough
+time to converge on a solution. Calling `cap('m')` at the end simply extracts 
the `Map` side-effect stored in "m"
+and returns it from the traversal as the result.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/src/recipes/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc
index 6f80ad0..3ac9142 100644
--- a/docs/src/recipes/index.asciidoc
+++ b/docs/src/recipes/index.asciidoc
@@ -44,6 +44,8 @@ include::if-then-based-grouping.asciidoc[]
 
 include::cycle-detection.asciidoc[]
 
+include::centrality.asciidoc[]
+
 Implementation Recipes
 ======================
 

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/static/images/betweeness-example.png
----------------------------------------------------------------------
diff --git a/docs/static/images/betweeness-example.png 
b/docs/static/images/betweeness-example.png
new file mode 100644
index 0000000..650ee53
Binary files /dev/null and b/docs/static/images/betweeness-example.png differ

Reply via email to