Hello,

Many moons ago, I wrote an email to this list called “The Beauty of Bulking” 
which discussed the concept of traverser bulking.

This notion is perhaps the most important aspect of the Gremlin traversal 
machine. It is what makes OLAP scale, speedy complex path-calculations 
plausible, and runtimes on reverberant traversals extremely fast.

Bulking can be summed up easily:

1. If you have a stream of numbers, say [6, 5, 4, 4, 4, 6] you can map() those 
numbers by *10 and thus do 6 arithmetic operations to yield [60, 50, 40, 40, 
40, 60].
2. However, if stream ordering does not matter (an important constraint to 
lift!), then you could get the “same” result with only 3 arithmetic operations 
and yield [60, 60, 50, 40, 40, 40].
3. In short, “bulk” your objects — 6(2), 5(1), 4(3). Then you only need to 
execute *10 three times.

Prior to this moment, the beauty of bulking has made itself apparent in every 
OLAP job executed as well as in repeat(), barrier(), and the epic work from Ted 
Wilmes on PathRetractionStrategy.

Kuppitz and I are adding to this subtle beauty with a rewrite of 
LazyBarrierStrategy and its inclusion into the default TraversalStrategies that 
are inherited (typically) by all graph system providers.

Lets look at what we have done to master/ and what you can expect in TinkerPop 
3.2.3.

graph = TinkerGraph.open()
graph.io(gryo()).readGraph('data/grateful-dead.kryo')
g1 = graph.traversal().withoutStrategies(LazyBarrierStrategy.class)
g2 = graph.traversal()

From g1 (no beauty) and g2 (beautiful), lets see how various traversals perform.

gremlin> clock(10){g1.V().out().in().out().count().iterate()}
==>1073.5150449999999
gremlin> clock(10){g2.V().out().in().out().count().iterate()}
==>8.8109061

Deep traversals with lots of intersections do the best. A 1 second query is 
dropped to 8 milliseconds. Thats a 100x+ speed improvement.

gremlin> clock(1000){g1.V().out().values('name').iterate()}
==>1.1394026739999998
gremlin> clock(1000){g2.V().out().values('name').iterate()}
==>1.030407659

Not that much intersection above, but we still get a little bump in performance.

**DRUM ROLL — THE PHOENIX RISES FROM THE TEXTS OF EDFU**

Lets get practical. 

Graph analysis is all about “reverberation.” What does that mean? — look at any 
graph algorithm. Its not about just “getting data," its about seeing how paths 
intersect/overlay/loop amongst themselves and each other. Paths tell you a lot 
about the “structure” of your data — topological statistics. PageRank, 
Betweenness, Eccentricity, Eigenvectors, Recommendations, Spreading Activation, 
…. these algorithms are all about the paths, not “the data." Lets look at a 
classic Amazon.com <http://amazon.com/>-style recommendation algorithm:

        “Person ‘a' likes X. X are liked by people B. People B like Y. 
Recommend ‘a’ items from Y that are not in X.”

g.V(89).out('followedBy').aggregate('a').
  in('followedBy').out(‘followedBy').
    where(not(within('a'))).
  groupCount().by('name')

Okay. So lets do that walk on our toy graph, where Person ‘a’ is v[89].

gremlin> 
clock(100){g1.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').iterate()}
==>23.73006225
gremlin> 
clock(100){g2.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').iterate()}
==>1.4410725199999999

So there you have it. The classic, “everyone and their mother does 
it”-traversal went from 23ms to 1.4ms. A 20x performance improvement on a 
practical everyday example.

gremlin> 
g2.V(89).out('followedBy').aggregate('a').in('followedBy').out('followedBy').where(not(within('a'))).groupCount().by('name').explain()
==>Traversal Explanation
================================================================================================================================================================================================================================================================================================================
Original Traversal                 [GraphStep(vertex,[89]), 
VertexStep(OUT,[followedBy],vertex), AggregateStep(a), 
VertexStep(IN,[followedBy],vertex), VertexStep(OUT,[followedBy],vertex), 
WherePredicateStep(without([a])), GroupCountStep(value(name))]
...
LazyBarrierStrategy          [O]   [GraphStep(vertex,[89]), 
VertexStep(OUT,[followedBy],vertex), AggregateStep(a), 
VertexStep(IN,[followedBy],vertex), NoOpBarrierStep(10000), 
VertexStep(OUT,[followedBy],vertex), NoOpBarrierStep(10000), 
WherePredicateStep(without([a])), GroupCountStep(value(name))]
...
Final Traversal                    [TinkerGraphStep(vertex,[89]), 
VertexStep(OUT,[followedBy],vertex), AggregateStep(a), 
VertexStep(IN,[followedBy],vertex), NoOpBarrierStep(10000), 
VertexStep(OUT,[followedBy],vertex), NoOpBarrierStep(10000), 
WherePredicateStep(without([a])), GroupCountStep(value(name))]

Word to your mutha,
Marko.

http://markorodriguez.com



Reply via email to