Hello,

Last week I was at a DataStax company retreat. As such, I didn't have the focus 
nor time to code. When I found available time here and there, I played with 
Spark via the scala> console. Stupid simple "word count"-style computations -- 
just to get my feet wet. However, I figured out how to do the "Faunus-style" 
MapReduce-based message passing via Spark's functional DSL and so this Monday 
morning I went to work.

        Day 1 (Monday): By the end of the day, I had hardcoded 
PageRankVertexProgram working over SparkGraphComputer.
        Day 2 (Tuesday): By the end of the day, I had TraversalVertexProgram 
(i.e. Gremlin OLAP) working over SparkGraphComputer.
        Day 3 (Wednesday): This morning I got all the integration tests (save 3 
!?) working over SparkGraphComputer.

So, to toot the proverbial self-indulgent horn, the above sequence of events is 
a testament to three things:

        1. Spark is a really easy framework to work with.
        2. The OLAP GraphComputer API of TinkerPop3 is flexible and easy to 
implement.
        3. I am a ravenous monster on the code base when I have a mission.

This work is currently in master/ and will be available in TP3.M8. You can read 
the docs here:

        http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#_olap_hadoop_gremlin 
(Hadoop supports 3 GraphComputers)
                
http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#sparkgraphcomputer 
(SparkGraphComputer)

Finally, for those interested in the implementation, please see:
        
https://github.com/apache/incubator-tinkerpop/tree/master/hadoop-gremlin/src/main/java/org/apache/tinkerpop/gremlin/hadoop/process/computer/spark

The core message passing algorithm is here:
        
https://github.com/apache/incubator-tinkerpop/blob/master/hadoop-gremlin/src/main/java/org/apache/tinkerpop/gremlin/hadoop/process/computer/spark/util/SparkHelper.java#L58

Note that I am no expert in Spark. As such, over time I hope to learn more (I 
bought a Spark book on Amazon.com) and will iterate to make the message passing 
algorithm both more time and space efficient.

Enjoy!,
Marko.

http://markorodriguez.com

Reply via email to