[jira] [Commented] (TINKERPOP3-319) BulkLoaderVertexProgram for generalized batch loading across graphs

Daniel Kuppitz (JIRA) Mon, 21 Sep 2015 10:03:59 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP3-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900987#comment-14900987
 ]


Daniel Kuppitz commented on TINKERPOP3-319:
-------------------------------------------

Need to correct my previous comment. For 3.0.2 we also need to get Neo4j 
("normal" mode, not HA) and TinkerGraph working. I guess TinkerGraph was still 
an open discussion (whether we should support persistence or not), but we will 
at least need Neo4j.

> BulkLoaderVertexProgram for generalized batch loading across graphs
> -------------------------------------------------------------------
>
>                 Key: TINKERPOP3-319
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-319
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.0.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Daniel Kuppitz
>             Fix For: 3.0.2-incubating
>
>
> After working on {{BulkLoaderVertexProgram}} for Titan, it is trivial to add 
> this generally to TinkerPop -- equivalent to BlueprintsOutputFormat (or 
> whatever the bulk loader was known that was blueprints specific). However, 
> given that Titan and TinkerPop have the same data model, Titan having its own 
> {{BulkLoaderVertexProgram}} isn't necessary as there is no longer a data 
> model alignment issue. The difference would be that instead of:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().titan(propertiesFile).create()).submit()
> {code}
> It would simply be:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().factory(propertiesFile).create()).submit()
> {code}
> ...and {{BulkLoaderVertexProgram}} would use {{GraphFactory.open()}} to 
> instantiate the connection to the graph. Moreover, (and [~spmallette] will 
> need to clear my head here), if the factory opened up a Gremlin Server 
> connection, then we get parallel writing to embedded graph databases like 
> Neo4j.
> {{BulkLoaderVertexProgram}} is simply a vertex program that parallel loads a 
> graph (with a graph computer) to any other graph that can be accessed via 
> {{GraphFactory}} (which is every TP3 graph).
> [~dalaro] @mbroecheler [~dkuppitz] 
> EXTENDED NOTES:
> * {{SchemaInference}} would be a MapReduce job executed prior to 
> {{BulkLoaderVertexProgram}}
> * Titan and Neo4j can each have their own {{SchemaInference}} implementations.
> * Incremental loading .... I forget how this worked.
> * Bulk mutations ... this is possible at the TP3 level with hidden properties 
> and smart add/remove/etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP3-319) BulkLoaderVertexProgram for generalized batch loading across graphs

Reply via email to