[ 
https://issues.apache.org/jira/browse/TINKERPOP3-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987611#comment-14987611
 ] 

ASF GitHub Bot commented on TINKERPOP3-904:
-------------------------------------------

GitHub user dkuppitz opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/133

    TINKERPOP3-904: BulkLoaderVertexProgram optimizations

    `BulkLoaderVertexProgram` now uses `EventStrategy` to monitor what the 
underlying `BulkLoader` implementation does (e.g. whether it creates a new 
vertex or returns an existing). This way we don't need modify method signatures 
and introduce breaking changes.
    
    Full integrations testsuite passed*
    
    VOTE: +1
    
    _* I had to exclude Giraph integration tests, since it no longer works on 
my machine_

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-904

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #133
    
----
commit 8b8222da7fed57fa1e7e0232ab788944716404f7
Author: Daniel Kuppitz <[email protected]>
Date:   2015-11-03T14:14:12Z

    Optimized BLVP
    
    BLVP now uses EventStrategy to monitor what the actual BulkLoader 
implementation does (e.g. whether it creates a new vertex or just returns an 
existing one).

commit 2a4b4ac67f4b1770a2ad69566fb819e42f312b07
Author: Daniel Kuppitz <[email protected]>
Date:   2015-11-03T16:48:56Z

    Merge branch 'master' into TINKERPOP3-904

commit 63c8cb87572ab2b6b3b3e386c5326edca017421e
Author: Daniel Kuppitz <[email protected]>
Date:   2015-11-03T16:52:31Z

    updated CHANGELOG

----


> BulkLoaderVertexProgram optimizations
> -------------------------------------
>
>                 Key: TINKERPOP3-904
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-904
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.0.2-incubating
>            Reporter: Daniel Kuppitz
>            Assignee: Daniel Kuppitz
>             Fix For: 3.1.0-incubating
>
>
> This is the continuation of 
> https://issues.apache.org/jira/browse/TINKERPOP3-319. A few suggestion were 
> made by [~mbroecheler] on how to optimize the current BLVP implementation. 
> Since these changes require breaking changes, they were not implemented for 
> 3.0.2.
> {quote}
> The following optimizations should be implemented to improve the performance 
> of BLVP:
> * In line 212, BLVP should get the information whether the vertex was created 
> or retrieved. If it was created (i.e. it did not exist before) then we are 
> guaranteed that it cannot have any vertex properties. As such, the BLVP 
> should then just create the vertex properties without checking for their 
> existence first - this will be significantly faster.
> * Similarly, when loading edges in the second iteration, it should first 
> compute this boolean variable {{requiresIncremental = 
> sourceVertex.edges(OUT).hasNext() && outV.edges(OUT).hasNext()}} and then 
> only do incremental loading on edges if this variable is true. If it is not 
> true incremental loading (i.e. checking for edge existence) isn't necessary.
> Both improvement together should dramatically improve the performance of BLVP 
> since it will require a read per edge/vertex property only in those cases 
> where a previous job failed. Under "normal" operational conditions it only 
> requires one read per vertex per iteration. That is, the reads scale in 
> O(|V|) and not O(|E|).
> In addition, there should be an option for IncrementalBulkLoader so that it 
> does not attempt to update edges and vertex properties when those already 
> exist. In most cases, the edge will be identical when it has been loaded in a 
> previous job (since edge and property mutations are atomic in most graph 
> databases) and hence this check is unnecessary and being able to make it 
> optional can save time.
> Note, that these are important optimizations for large scale graph databases 
> where bulk loading is necessary to get started.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to