[jira] [Commented] (TINKERPOP3-319) BulkLoaderVertexProgram for generalized batch loading across graphs

stephen mallette (JIRA) Mon, 21 Sep 2015 05:50:00 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP3-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900597#comment-14900597
 ]


stephen mallette commented on TINKERPOP3-319:
---------------------------------------------

[~dkuppitz] can you talk about what you think blvp will be for 3.0.2?  i just 
think it would be good to understand what is expected there so that we know 
when this ticket is complete.  we can then open new issues for 3.1.x features 
as needed.

> BulkLoaderVertexProgram for generalized batch loading across graphs
> -------------------------------------------------------------------
>
>                 Key: TINKERPOP3-319
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-319
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.0.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Daniel Kuppitz
>             Fix For: 3.0.2-incubating
>
>
> After working on {{BulkLoaderVertexProgram}} for Titan, it is trivial to add 
> this generally to TinkerPop -- equivalent to BlueprintsOutputFormat (or 
> whatever the bulk loader was known that was blueprints specific). However, 
> given that Titan and TinkerPop have the same data model, Titan having its own 
> {{BulkLoaderVertexProgram}} isn't necessary as there is no longer a data 
> model alignment issue. The difference would be that instead of:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().titan(propertiesFile).create()).submit()
> {code}
> It would simply be:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().factory(propertiesFile).create()).submit()
> {code}
> ...and {{BulkLoaderVertexProgram}} would use {{GraphFactory.open()}} to 
> instantiate the connection to the graph. Moreover, (and [~spmallette] will 
> need to clear my head here), if the factory opened up a Gremlin Server 
> connection, then we get parallel writing to embedded graph databases like 
> Neo4j.
> {{BulkLoaderVertexProgram}} is simply a vertex program that parallel loads a 
> graph (with a graph computer) to any other graph that can be accessed via 
> {{GraphFactory}} (which is every TP3 graph).
> [~dalaro] @mbroecheler [~dkuppitz] 
> EXTENDED NOTES:
> * {{SchemaInference}} would be a MapReduce job executed prior to 
> {{BulkLoaderVertexProgram}}
> * Titan and Neo4j can each have their own {{SchemaInference}} implementations.
> * Incremental loading .... I forget how this worked.
> * Bulk mutations ... this is possible at the TP3 level with hidden properties 
> and smart add/remove/etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP3-319) BulkLoaderVertexProgram for generalized batch loading across graphs

Reply via email to