[
https://issues.apache.org/jira/browse/TINKERPOP3-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544269#comment-14544269
]
Matthias Broecheler commented on TINKERPOP3-498:
------------------------------------------------
The problem I see with that generalization is that TraversalVertexProgram uses
global messages excessively and thus implementations cannot exploit the
benefits of local message sending to make things more efficient.
With the current PageRankVP implementation, Fulgora can effectively use the
message pulling model which makes things very efficient (only random reads on
other vertices, one local write). TraversalVP's global messages require a bunch
of random writes which makes things incredibly slow since those writes must be
synchronized in addition to being random.
> Could we get rid of PageRankVertexProgram? -- is everything just
> TraversalVertexProgram?
> ----------------------------------------------------------------------------------------
>
> Key: TINKERPOP3-498
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-498
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
>
> There is the notion of a message in {{GraphComputer}}. A {{Traverser}} is
> sufficiently complex to represent any message (i.e. {{Traverser.get()}}).
> Next, the {{MessageCombiner}} is simply {{Traverser.merge()}}. Next, is
> {{PageRankVertexProgram}} (and others) simply a combination of {{Steps}} to
> execute via a {{TraversalVertexProgram}}.
> Look at the {{execute()}} method of {{PageRankVertexProgram}}:
> {code:java}
> @Override
> public void execute(final Vertex vertex, Messenger<Double> messenger,
> final Memory memory) {
> if (memory.isInitialIteration()) {
> messenger.sendMessage(this.countMessageScope, 1.0d);
> } else if (1 == memory.getIteration()) {
> double initialPageRank = 1.0d / this.vertexCountAsDouble;
> double edgeCount =
> StreamFactory.stream(messenger.receiveMessages(this.countMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> vertex.singleProperty(PAGE_RANK, initialPageRank);
> vertex.singleProperty(EDGE_COUNT, edgeCount);
> messenger.sendMessage(this.incidentMessageScope, initialPageRank
> / edgeCount);
> } else {
> double newPageRank =
> StreamFactory.stream(messenger.receiveMessages(this.incidentMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> newPageRank = (this.alpha * newPageRank) + ((1.0d - this.alpha) /
> this.vertexCountAsDouble);
> vertex.singleProperty(PAGE_RANK, newPageRank);
> messenger.sendMessage(this.incidentMessageScope, newPageRank /
> vertex.<Double>value(EDGE_COUNT));
> }
> }
> {code}
> Lets see what that looks like as a pure Gremlin play:
> {code:java}
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',count.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30)
> {code}
> In essence, the first line of the traversal gets the edge counts. The
> {{repeat()}}-traversal aggregates the incoming pageRank values, sets the
> vertex's current pageRank value, divides the pageRank value by the number of
> edges, and splits that traverser amongst all the outgoing edges. It does this
> 30 times.
> In this way, all that needs to exist is {{TraversalVertexProgram}}. Moreover,
> its all just Gremlin. Of course, we can canned algorithms, e.g.:
> {code:java}
> public Algorithms {
> public static final Traversal<Vertex,Vertex> PAGE_RANK =
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',p.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30);
> public static final Traversal<Vertex,Vertex> ANOTHER_ALGORITHM = ...
> }
> {code}
> This is pretty insane.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)