Hi,

CURRENT: The GraphComputer framework assumes “vertex-centric” computing. That 
is, a vertex receives a message and does something with it. Moreover, it can 
send messages to other vertices.

We got this wrong and I think we should do it right with GraphActors.

FUTURE: The GraphActors framework assumes “partition-centric” computing. That 
is, a partition receives a message and does something with it. Moreover, it can 
send messages to other partitions.

——

VertexProgram.execute(final Vertex vertex, Iterator<M> messages)

should have been:

PartitionProgram.execute(Partition partition, Iterator<M> message)

in fact, ActorProgram’s execute() method is defined as:

ActorProgram.execute(M message).

1. Every Actor owns a Partition and thus, you don’t need to pass in the 
Partition.
2. To support ASP (asynchrounous) and BSP (synchronous) computing, you don’t 
provide an Iterator<M>, just an M as they come through (event-driven).
3. All partitions are assumed to have random access capabilities. All the data 
in the partition is randomly accessible.
4. A partition is a generalization of GraphComputer’s Vertex, where at the 
micro-limit, every Vertex is in its own Partition. This is how we think about 
SparkGraphComputer, GiraphGraphComputer, etc. — the “star graph." However, by 
generalizing to larger subgraphs than just Vertex, we can have more work being 
done per iteration in SparkGraphComputer, etc. Moreover, by generalizing to 
partition, we don’t have to have all edges of a vertex co-located and thus, can 
support edge-cut systems (liked DSEGraph).

So, what does this mean for the future? This injection from “vertex-centric” to 
“partition-centric” allows us to easily create SparkGraphActors. Next, how do 
you verify if a traversal will be able to legally execute against the 
underlying GraphActors system? It depends on “the rules” of the Partitioner. A 
Partitioner should have Features which define the boundaries of its data 
sphere. By looking at those Features and looking at the semantics of the 
Traversal, it is possible to ensure that the Traversal will work against the 
Features. If not, ActorVerificationException. If so, execute it.

In conclusion — I’m starting to see GraphComputer as our OLAP 1.0 and 
GraphActors as our OLAP/OLTP 2.0. I put in there OLTP because with systems like 
Akka that don’t require big bulk data migrations, you can execute against the 
Graph connection object…. Even with SparkGraphActors, you could just have 
workers that work against Graph connection objects (the only RDD data is 
messages!!!). Thus, with GraphActors, we start to smear the concept of OLAP and 
OLTP.

Anywho — I think if we get GraphActors right, we will solve many of the 
shortcomings of GraphComputer while, at the same time, providing a powerful 
distributed graph computing framework. 

Take care,
Marko.

http://markorodriguez.com



Reply via email to