[ 
https://issues.apache.org/jira/browse/TINKERPOP-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228815#comment-15228815
 ] 

pieter martin commented on TINKERPOP-1250:
------------------------------------------

Is the idea to use the current {{PartitionStrategy}} to define the partition 
boundaries?

> Support Subgraph-Centric GraphComputer
> --------------------------------------
>
>                 Key: TINKERPOP-1250
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1250
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>
> Right now, {{GraphComputer}} and {{VertexPrograms}} are "vertex-centric." 
> That is, the boundary of an atomic unit of computation is a single vertex, 
> its properties, and its incident edges. We should support "subgraph-centric" 
> computing where each worker's partition is loaded in memory (RAM) as a 
> connected subgraph. For those vertices that are not in its partition, a 
> special "reference vertex" is use to reference it. What this would mean is 
> that when a {{Traverser}} is processing, it can continue to evaluate as long 
> as its within the subgraph partition. The moment it references a vertex not 
> in the partition (a "reference vertex"), it serializes itself as a message.
> This would greatly increase the speed of Gremlin OLAP at the cost of 
> requiring a large amount of memory to store all the worker partition 
> subgraphs in RAM. There might even be a way to have a hybrid-model where some 
> of the partition is held in RAM and the other is (even though still in the 
> same partition) is stored as "star vertices."
> How would this be added to {{GraphComputer}} in a backwards compatible way?
> {code}
> GraphComputer.supportsVertexCentricComputing)() // currently true for all 
> implementations
> GraphComputer.supportsSubgraphCentricComputing()
> {code}
> A {{VertexProgram}} could then have the following method:
> {code}
> boolean VertexProgram.withinCentricity(final M message)
> {code}
> If the message is NOT within "centricity", then it is serialized and 
> distributed, else it continues to execute.
> I haven't thought through all the API and implementation considerations. 
> Though it would be good to make this backwards compatible and, moving 
> forward, able to support "edge-centric computing" and thus, have a very 
> memory limited OLAP system.
> How to think of the different models:
> 1. Vertex-centric: medium speed, medium expressivity, medium memory costs.
> 2. Subgraph-centric: high speed, high expressivity, high memory costs.
> 3. Edge-centric: high speed, low expressivity, low memory costs.
> What is "expressivity"? Well, subgraph-centric computations can have "local 
> traversals" that move beyond the "star graph." Edge-centric would not support 
> any `by()`-modulators as the computation is bound to a single edge.  Thus, 
> low expressivity. It really depends on how you represent the edge-centric 
> edge list. Do you have the vertices on each side of an edge duplicated with 
> all their properties? This wold still be low memory costs, but you could get 
> more expressivity.
> Anywho, in general it would be nice if the underlying execution engine can 
> handle the three common distributed graph computing paradigms. 
> Subgraph-centric seems the easiest to support at this point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to