[
https://issues.apache.org/jira/browse/TINKERPOP3-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716818#comment-14716818
]
Marko A. Rodriguez commented on TINKERPOP3-498:
-----------------------------------------------
We are not going to do this. But what we should do, is see how far we can get
with making Gremlin OLAP be able to "easily" do PageRank OLAP. Will leave this
ticket open as a frame of reference.
Key ideas here are using the new mutation API in Gremlin as well as the math
API that [~dkuppitz] wants to add.
> Could we get rid of PageRankVertexProgram? -- is everything just
> TraversalVertexProgram?
> ----------------------------------------------------------------------------------------
>
> Key: TINKERPOP3-498
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-498
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.0.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
>
> There is the notion of a message in {{GraphComputer}}. A {{Traverser}} is
> sufficiently complex to represent any message (i.e. {{Traverser.get()}}).
> Next, the {{MessageCombiner}} is simply {{Traverser.merge()}}. Next, is
> {{PageRankVertexProgram}} (and others) simply a combination of {{Steps}} to
> execute via a {{TraversalVertexProgram}}.
> Look at the {{execute()}} method of {{PageRankVertexProgram}}:
> {code:java}
> @Override
> public void execute(final Vertex vertex, Messenger<Double> messenger,
> final Memory memory) {
> if (memory.isInitialIteration()) {
> messenger.sendMessage(this.countMessageScope, 1.0d);
> } else if (1 == memory.getIteration()) {
> double initialPageRank = 1.0d / this.vertexCountAsDouble;
> double edgeCount =
> StreamFactory.stream(messenger.receiveMessages(this.countMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> vertex.singleProperty(PAGE_RANK, initialPageRank);
> vertex.singleProperty(EDGE_COUNT, edgeCount);
> messenger.sendMessage(this.incidentMessageScope, initialPageRank
> / edgeCount);
> } else {
> double newPageRank =
> StreamFactory.stream(messenger.receiveMessages(this.incidentMessageScope)).reduce(0.0d,
> (a, b) -> a + b);
> newPageRank = (this.alpha * newPageRank) + ((1.0d - this.alpha) /
> this.vertexCountAsDouble);
> vertex.singleProperty(PAGE_RANK, newPageRank);
> messenger.sendMessage(this.incidentMessageScope, newPageRank /
> vertex.<Double>value(EDGE_COUNT));
> }
> }
> {code}
> Lets see what that looks like as a pure Gremlin play:
> {code:java}
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',count.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30)
> {code}
> In essence, the first line of the traversal gets the edge counts. The
> {{repeat()}}-traversal aggregates the incoming pageRank values, sets the
> vertex's current pageRank value, divides the pageRank value by the number of
> edges, and splits that traverser amongst all the outgoing edges. It does this
> 30 times.
> In this way, all that needs to exist is {{TraversalVertexProgram}}. Moreover,
> its all just Gremlin. Of course, we can canned algorithms, e.g.:
> {code:java}
> public Algorithms {
> public static final Traversal<Vertex,Vertex> PAGE_RANK =
> __.in().fold(0,(a,b) -> a+b).sideEffect(count ->
> count.sideEffect('count',p.get())).repeat(
> __.fold(0,(a,b)->a+b)
> .sideEffect(pageRank ->
> pageRank.sideEffect('pageRank',pageRank.get()))
> .map(pageRank -> pageRank.get() / pageRank.sideEffect('count'))
> .out()).times(30);
> public static final Traversal<Vertex,Vertex> ANOTHER_ALGORITHM = ...
> }
> {code}
> This is pretty insane.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)