[ https://issues.apache.org/jira/browse/TINKERPOP-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stephen mallette updated TINKERPOP-1074: ---------------------------------------- Fix Version/s: (was: 3.2.7) > More contractual testing/specifications around Persist and ResultGraph. > ----------------------------------------------------------------------- > > Key: TINKERPOP-1074 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1074 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > > A {{ComputerResult}} references two objects: a graph and a memory. The graph > is the resultant computed graph and the memory contains all the sideEffect > data from the computation (if any). > Right now, we have the following {{Persist}} options: {{NOTHING}}, > {{VERTEX_PROPERTIES}}, {{EDGES}}. We also have the following {{ResultGraph}} > options: {{ORIGINAL}}, {{NEW}}. > * NOTHING + ORIGINAL = ComputerResult contains original graph reference. > * NOTHING + NEW = ?? No test to force what this means! Should be > {{EmptyGraph.instance()}}. > * VERTEX_PROPERTIES + ORIGINAL = ComputerResult contains original graph, but > the computed vertex properties have been "saved" to it. (no contractual test > cases here either!) > * VERTEX_PROPERTIES + NEW = ComputerResult contains new graph with only > vertices and their properties. > * EDGES + NEW = ComputerResult contains new graph with vertices, edges, and > their properties. > * EDGES + ORIGINAL = ComputerResult contains original graph, but the computed > vertex properties and edges have been "saved" to it. (no contractual test > cases here either!) > {{TinkerGraphComputer}} is the only system that supports all the above > configuration combinations. Add test cases to {{GraphComputerTest}} that > verify the behavior of all combinations. > HOWEVER !!!! ------ should we really respect ORIGINAL+PERSIST? Most providers > will use {{BulkLoaderVertexProgram}} to write the computed graph back to the > original graph. If there are TWO ways of doing this, this seems bad? In fact, > the way that TinkerGraphComputer writes the computed graph back to the > original graph is nearly identical to how it BulkLoaderVertexProgram works. > Thus, I'm wondering if we simply get rid the concept of {{ResultGraph}} and > ONLY have {{Persist}}. > * Persist.NOTHING: Returns the original graph in {{ComputerResult}}. > * Persist.VERTEX_PROPERTIES: Returns a new graph with only vertices and > properties. > * Persist.EDGES: Returns a new graph with vertices, edges, and their > properties. > For in-memory graphs like {{TinkerGraph}}, "new graph" can mean the original > graph with the {{GraphView}} overlay. Thus, its not really a full copy of the > original graph. Moreover, Persist.NOTHING just garbage collects the GraphView > and thus, the original graph. > ------------------ > Next, what does {{Persist}} mean for memory? Remember, {{ComputerResult}} > also has a reference to sideEffect memory. What if you want to run a job, NOT > persist the graph, but persist the memory only. I think we should ALWAYS > assume memory persistence. For TinkerGraph, that means the the > ComputerResult.memory() has a HashMap of memory values. For Giraph/Spark, > that means that the {{Storage}} will always have resultant sideEffect data in > the output directory even if there is no graph. > * {{NOTHING}}: persist memory and return the original graph. > * {{VERTEX_PROPERTIES}}: persist memory and return new graph of just vertex > properties. > * {{EDGES}}: persist memory and return new graph of vertex properties, and > edges. > Decisions, decisions, decisions.... -- This message was sent by Atlassian JIRA (v6.4.14#64029)