Hello,
Many moons ago, I wrote an email to this list called “The Beauty of Bulking”
which discussed the concept of traverser bulking.
This notion is perhaps the most important aspect of the Gremlin traversal
machine. It is what makes OLAP scale, speedy complex path-calculations
plausible, and runtimes on reverberant traversals extremely fast.
Bulking can be summed up easily:
1. If you have a stream of numbers, say [6, 5, 4, 4, 4, 6] you can map() those
numbers by *10 and thus do 6 arithmetic operations to yield [60, 50, 40, 40,
40, 60].
2. However, if stream ordering does not matter (an important constraint to
lift!), then you could get the “same” result with only 3 arithmetic operations
and yield [60, 60, 50, 40, 40, 40].
3. In short, “bulk” your objects — 6(2), 5(1), 4(3). Then you only need to
execute *10 three times.
Prior to this moment, the beauty of bulking has made itself apparent in every
OLAP job executed as well as in repeat(), barrier(), and the epic work from Ted
Wilmes on PathRetractionStrategy.
Kuppitz and I are adding to this subtle beauty with a rewrite of
LazyBarrierStrategy and its inclusion into the default TraversalStrategies that
are inherited (typically) by all graph system providers.
Lets look at what we have done to master/ and what you can expect in TinkerPop
3.2.3.
graph = TinkerGraph.open()
graph.io(gryo()).readGraph('data/grateful-dead.kryo')
g1 = graph.traversal().withoutStrategies(LazyBarrierStrategy.class)
g2 = graph.traversal()
From g1 (no beauty) and g2 (beautiful), lets see how various traversals perform.
gremlin> clock(10){g1.V().out().in().out().count().iterate()}
==>1073.5150449999999
gremlin> clock(10){g2.V().out().in().out().count().iterate()}
==>8.8109061
Deep traversals with lots of intersections do the best. A 1 second query is
dropped to 8 milliseconds. Thats a 100x+ speed improvement.
gremlin> clock(1000){g1.V().out().values('name').iterate()}
==>1.1394026739999998
gremlin> clock(1000){g2.V().out().values('name').iterate()}
==>1.030407659
Not that much intersection above, but we still get a little bump in performance.
**DRUM ROLL — THE PHOENIX RISES FROM THE TEXTS OF EDFU**
Lets get practical.
Graph analysis is all about “reverberation.” What does that mean? — look at any
graph algorithm. Its not about just “getting data," its about seeing how paths
intersect/overlay/loop amongst themselves and each other. Paths tell you a lot
about the “structure” of your data — topological statistics. PageRank,
Betweenness, Eccentricity, Eigenvectors, Recommendations, Spreading Activation,
…. these algorithms are all about the paths, not “the data." Lets look at a
classic Amazon.com <http://amazon.com/>-style recommendation algorithm:
“Person ‘a' likes X. X are liked by people B. People B like Y.
Recommend ‘a’ items from Y that are not in X.”
g.V(89).out('followedBy').aggregate('a').
in('followedBy').out(‘followedBy').
where(not(within('a'))).
groupCount().by('name')
Okay. So lets do that walk on our toy graph, where Person ‘a’ is v[89].
gremlin>
clock(100){g1.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').iterate()}
==>23.73006225
gremlin>
clock(100){g2.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').iterate()}
==>1.4410725199999999
So there you have it. The classic, “everyone and their mother does
it”-traversal went from 23ms to 1.4ms. A 20x performance improvement on a
practical everyday example.
gremlin>
g2.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').explain()
==>Traversal Explanation
================================================================================================================================================================================================================================================================================================================
Original Traversal [GraphStep(vertex,[89]),
VertexStep(OUT,[followedBy],vertex), AggregateStep(a),
VertexStep(IN,[followedBy],vertex), VertexStep(OUT,[followedBy],vertex),
WherePredicateStep(without([a])), GroupCountStep(value(name))]
...
LazyBarrierStrategy [O] [GraphStep(vertex,[89]),
VertexStep(OUT,[followedBy],vertex), AggregateStep(a),
VertexStep(IN,[followedBy],vertex), NoOpBarrierStep(10000),
VertexStep(OUT,[followedBy],vertex), NoOpBarrierStep(10000),
WherePredicateStep(without([a])), GroupCountStep(value(name))]
...
Final Traversal [TinkerGraphStep(vertex,[89]),
VertexStep(OUT,[followedBy],vertex), AggregateStep(a),
VertexStep(IN,[followedBy],vertex), NoOpBarrierStep(10000),
VertexStep(OUT,[followedBy],vertex), NoOpBarrierStep(10000),
WherePredicateStep(without([a])), GroupCountStep(value(name))]
Word to your mutha,
Marko.
http://markorodriguez.com