Hello everyone,

There is currently no active development on TinkerPop 3.2.0, however, in my 
spare time I've been developing (on paper) some new ideas that should make 
traversals, DSLs, and OLAP even better.

---------------------------------------------------------------------------------------

Problem #1: The Builder pattern for TraversalSources is lame.
        [https://issues.apache.org/jira/browse/TINKERPOP-971]

The first proposal is to make use of a fluent API to construct a 
TraversalSource and then, ultimately spawn a Traversal. For instance:

        g = 
graph.traversal().withComputer(SparkGraphComputer).withStrategy(MyPersonalStrategy.instance());

And, as we can't do this currently in 3.1.x:

        g = graph.traversal().withComputer(graph -> 
graph.compute(SparkGraphComputer).workers(10)).withStrategy(MyPersonalStrategy.instance());

In essence, like Traversal, a TraversalSource is constructed in a fluent 
manner. The methods for TraversalSource would be:

        withComputer()
        withStrategy()
        withoutStrategy() // remove default strategies
        withBulk() // 3.2.0 will provide generalized bulking 
[https://issues.apache.org/jira/browse/TINKERPOP-960]
        withSack() // you can declare the sack once and reuse its definition 
with each traversal spawned
        withSideEffect() // like withSack() (and don't worry, immutable 
thread-safe..you'll see)

Finally, for custom DSLs with respective TraversalSources, users will be able 
to do:
        [https://issues.apache.org/jira/browse/TINKERPOP-786]

        social = 
graph.traversal(SocialTraversalSource.class).withComputer(…).withStrategy(…).withBulk(…)

---------------------------------------------------------------------------------------

Problem #2: It is not natural going from OLTP to OLAP to OLTP to OLAP.
        [https://issues.apache.org/jira/browse/TINKERPOP-570]

For this problem, I think we can go far with an 
AbstractVertexProgramStep<ComputerResult,ComputerResult> in the core step 
library with, for example, TraversalVertexProgramStep and 
PageRankVertexProgramStep being subclasses. What does this get us?

        g = graph.traversal().withComputer(SparkGraphComputer)
        g.V().values("name")

The above traversal would compile to:

        [TraversalVertexProgramStep([GraphStep,PropertiesStep(values,name)], 
ComputerResultStep]

Thus, TraversalVertexProgramStep would simply pass its ComputerResult to 
ComputerResultStep<ComputerResult,E> which would know to flatMap-out 
computerResult.memory().get("~traversers"). Okay, so this is all fine and good 
and we currently do something analogous to this today. However, watch when we 
do an OLAP chain.

        
g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").valueMap("name","page.rank")

The above traversal would compile to:

        
[TraversalVertexProgramStep([GraphStep,HasStep(label,person)]),PageRankVertexProgramStep(0.85,[VertexStep(knows)]),TraversalVertexProgramStep([PropertyMapStep(name,page.rank)]),ComputerResultStep]

The first TraversalVertexProgramStep will give its ComputerResult to 
PageRankVertexProgram. PageRankVertexProgram will use the 
computerResult.graph() for its computation. Note that after the completion of 
this PageRank computation, the subsequent ComputerResult.graph() will have both 
HALTED_TRAVERSERS and PAGE_RANK properties on the vertices. Thus, when the 
final TraversalVertexProgramStep takes over, it simply brings the 
HALTED_TRAVERSERS (which are at person vertices) back to life and then it will 
execute its traversal which will be able to read the PAGE_RANK values! Tada!

To give even more street-cred to this idea, check this traversal:

        
g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").order().by("page.rank",decr).limit(10).values("name")

This will compile to:

        [TraversalVertexProgramStep([GraphStep,HasStep(label,person)]),         
                // OLAP 
         PageRankVertexProgramStep(0.85,[VertexStep(knows)]),                   
                // OLAP 
         
TraversalVertexProgramStep([OrderGlobalStep(page.rank,decr)]),ComputerResultStep,
      // OLAP
         RangeGlobalStep(0,10),PropertiesStep(values,name)]                     
                // OLTP

The ability to compile arbitrary segments of a traversal into an OLAP job and 
then pass the ComputerResult.graph() between jobs will enable us to easily (w/o 
user awareness) move between OLTP/OLAP within a single traversal. Moreover, 
there are numerous traversal patterns that currently will not execute in OLAP 
(e.g. you may sometimes see exceptions like "mid-traversal barriers are not 
allowed"), but with this model, they will work as we will be able to go 
OLAP->OLTP->OLAP.

---------------------------------------------------------------------------------------

Conclusion

In conclusion, configuring a TraversalSource will be much more elegant and we 
will be able to incorporate VertexPrograms into the Traversal API. Moreover, 
much like we have "lambda steps" for user defined step functions (e.g. 
map{it.get() + Math.sqrt(10)}), we will have a program()-step.

        
g.V().program(MyVertexProgram.instance()).values("my.vertex.program.property")

For all the VertexPrograms that TinkerPop provides, we will have respective 
steps in the GraphTraversal API that will have all the nice 
by()-modulations/etc.

I hope everyone sees the beauty of this new model and perhaps has some 
thoughts/recommendations regarding its design. Finally, as a parting note, 
contemplate this:

        
g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").bulkLoad(graph,"page.rank")
 
                /// needs thought, but you get the direction.

Take care,
Marko.

http://markorodriguez.com

Reply via email to