[ 
https://issues.apache.org/jira/browse/TINKERPOP3-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989530#comment-14989530
 ] 

ASF GitHub Bot commented on TINKERPOP3-904:
-------------------------------------------

Github user spmallette commented on a diff in the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/133#discussion_r43880445
  
    --- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
 ---
    @@ -407,4 +415,84 @@ public boolean requiresVertexPropertyAddition() {
                 }
             };
         }
    +
    +    static class BulkLoadingListener implements MutationListener {
    +
    +        private long counter;
    +        private boolean isNewVertex;
    +
    +        public BulkLoadingListener() {
    +            this.counter = 0L;
    +            this.isNewVertex = false;
    +            ;
    --- End diff --
    
    you have an empty code line here - just something to remove before you 
merge.


> BulkLoaderVertexProgram optimizations
> -------------------------------------
>
>                 Key: TINKERPOP3-904
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-904
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.0.2-incubating
>            Reporter: Daniel Kuppitz
>            Assignee: Daniel Kuppitz
>             Fix For: 3.1.0-incubating
>
>
> This is the continuation of 
> https://issues.apache.org/jira/browse/TINKERPOP3-319. A few suggestion were 
> made by [~mbroecheler] on how to optimize the current BLVP implementation. 
> Since these changes require breaking changes, they were not implemented for 
> 3.0.2.
> {quote}
> The following optimizations should be implemented to improve the performance 
> of BLVP:
> * In line 212, BLVP should get the information whether the vertex was created 
> or retrieved. If it was created (i.e. it did not exist before) then we are 
> guaranteed that it cannot have any vertex properties. As such, the BLVP 
> should then just create the vertex properties without checking for their 
> existence first - this will be significantly faster.
> * Similarly, when loading edges in the second iteration, it should first 
> compute this boolean variable {{requiresIncremental = 
> sourceVertex.edges(OUT).hasNext() && outV.edges(OUT).hasNext()}} and then 
> only do incremental loading on edges if this variable is true. If it is not 
> true incremental loading (i.e. checking for edge existence) isn't necessary.
> Both improvement together should dramatically improve the performance of BLVP 
> since it will require a read per edge/vertex property only in those cases 
> where a previous job failed. Under "normal" operational conditions it only 
> requires one read per vertex per iteration. That is, the reads scale in 
> O(|V|) and not O(|E|).
> In addition, there should be an option for IncrementalBulkLoader so that it 
> does not attempt to update edges and vertex properties when those already 
> exist. In most cases, the edge will be identical when it has been loaded in a 
> previous job (since edge and property mutations are atomic in most graph 
> databases) and hence this check is unnecessary and being able to make it 
> optional can save time.
> Note, that these are important optimizations for large scale graph databases 
> where bulk loading is necessary to get started.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to