Hi, CURRENT: The GraphComputer framework assumes “vertex-centric” computing. That is, a vertex receives a message and does something with it. Moreover, it can send messages to other vertices.
We got this wrong and I think we should do it right with GraphActors. FUTURE: The GraphActors framework assumes “partition-centric” computing. That is, a partition receives a message and does something with it. Moreover, it can send messages to other partitions. —— VertexProgram.execute(final Vertex vertex, Iterator<M> messages) should have been: PartitionProgram.execute(Partition partition, Iterator<M> message) in fact, ActorProgram’s execute() method is defined as: ActorProgram.execute(M message). 1. Every Actor owns a Partition and thus, you don’t need to pass in the Partition. 2. To support ASP (asynchrounous) and BSP (synchronous) computing, you don’t provide an Iterator<M>, just an M as they come through (event-driven). 3. All partitions are assumed to have random access capabilities. All the data in the partition is randomly accessible. 4. A partition is a generalization of GraphComputer’s Vertex, where at the micro-limit, every Vertex is in its own Partition. This is how we think about SparkGraphComputer, GiraphGraphComputer, etc. — the “star graph." However, by generalizing to larger subgraphs than just Vertex, we can have more work being done per iteration in SparkGraphComputer, etc. Moreover, by generalizing to partition, we don’t have to have all edges of a vertex co-located and thus, can support edge-cut systems (liked DSEGraph). So, what does this mean for the future? This injection from “vertex-centric” to “partition-centric” allows us to easily create SparkGraphActors. Next, how do you verify if a traversal will be able to legally execute against the underlying GraphActors system? It depends on “the rules” of the Partitioner. A Partitioner should have Features which define the boundaries of its data sphere. By looking at those Features and looking at the semantics of the Traversal, it is possible to ensure that the Traversal will work against the Features. If not, ActorVerificationException. If so, execute it. In conclusion — I’m starting to see GraphComputer as our OLAP 1.0 and GraphActors as our OLAP/OLTP 2.0. I put in there OLTP because with systems like Akka that don’t require big bulk data migrations, you can execute against the Graph connection object…. Even with SparkGraphActors, you could just have workers that work against Graph connection objects (the only RDD data is messages!!!). Thus, with GraphActors, we start to smear the concept of OLAP and OLTP. Anywho — I think if we get GraphActors right, we will solve many of the shortcomings of GraphComputer while, at the same time, providing a powerful distributed graph computing framework. Take care, Marko. http://markorodriguez.com
