Hello,
Last week I was at a DataStax company retreat. As such, I didn't have the focus
nor time to code. When I found available time here and there, I played with
Spark via the scala> console. Stupid simple "word count"-style computations --
just to get my feet wet. However, I figured out how to do the "Faunus-style"
MapReduce-based message passing via Spark's functional DSL and so this Monday
morning I went to work.
Day 1 (Monday): By the end of the day, I had hardcoded
PageRankVertexProgram working over SparkGraphComputer.
Day 2 (Tuesday): By the end of the day, I had TraversalVertexProgram
(i.e. Gremlin OLAP) working over SparkGraphComputer.
Day 3 (Wednesday): This morning I got all the integration tests (save 3
!?) working over SparkGraphComputer.
So, to toot the proverbial self-indulgent horn, the above sequence of events is
a testament to three things:
1. Spark is a really easy framework to work with.
2. The OLAP GraphComputer API of TinkerPop3 is flexible and easy to
implement.
3. I am a ravenous monster on the code base when I have a mission.
This work is currently in master/ and will be available in TP3.M8. You can read
the docs here:
http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#_olap_hadoop_gremlin
(Hadoop supports 3 GraphComputers)
http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#sparkgraphcomputer
(SparkGraphComputer)
Finally, for those interested in the implementation, please see:
https://github.com/apache/incubator-tinkerpop/tree/master/hadoop-gremlin/src/main/java/org/apache/tinkerpop/gremlin/hadoop/process/computer/spark
The core message passing algorithm is here:
https://github.com/apache/incubator-tinkerpop/blob/master/hadoop-gremlin/src/main/java/org/apache/tinkerpop/gremlin/hadoop/process/computer/spark/util/SparkHelper.java#L58
Note that I am no expert in Spark. As such, over time I hope to learn more (I
bought a Spark book on Amazon.com) and will iterate to make the message passing
algorithm both more time and space efficient.
Enjoy!,
Marko.
http://markorodriguez.com