[
https://issues.apache.org/jira/browse/TINKERPOP3-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987611#comment-14987611
]
ASF GitHub Bot commented on TINKERPOP3-904:
-------------------------------------------
GitHub user dkuppitz opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/133
TINKERPOP3-904: BulkLoaderVertexProgram optimizations
`BulkLoaderVertexProgram` now uses `EventStrategy` to monitor what the
underlying `BulkLoader` implementation does (e.g. whether it creates a new
vertex or returns an existing). This way we don't need modify method signatures
and introduce breaking changes.
Full integrations testsuite passed*
VOTE: +1
_* I had to exclude Giraph integration tests, since it no longer works on
my machine_
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-904
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/133.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #133
----
commit 8b8222da7fed57fa1e7e0232ab788944716404f7
Author: Daniel Kuppitz <[email protected]>
Date: 2015-11-03T14:14:12Z
Optimized BLVP
BLVP now uses EventStrategy to monitor what the actual BulkLoader
implementation does (e.g. whether it creates a new vertex or just returns an
existing one).
commit 2a4b4ac67f4b1770a2ad69566fb819e42f312b07
Author: Daniel Kuppitz <[email protected]>
Date: 2015-11-03T16:48:56Z
Merge branch 'master' into TINKERPOP3-904
commit 63c8cb87572ab2b6b3b3e386c5326edca017421e
Author: Daniel Kuppitz <[email protected]>
Date: 2015-11-03T16:52:31Z
updated CHANGELOG
----
> BulkLoaderVertexProgram optimizations
> -------------------------------------
>
> Key: TINKERPOP3-904
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-904
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.0.2-incubating
> Reporter: Daniel Kuppitz
> Assignee: Daniel Kuppitz
> Fix For: 3.1.0-incubating
>
>
> This is the continuation of
> https://issues.apache.org/jira/browse/TINKERPOP3-319. A few suggestion were
> made by [~mbroecheler] on how to optimize the current BLVP implementation.
> Since these changes require breaking changes, they were not implemented for
> 3.0.2.
> {quote}
> The following optimizations should be implemented to improve the performance
> of BLVP:
> * In line 212, BLVP should get the information whether the vertex was created
> or retrieved. If it was created (i.e. it did not exist before) then we are
> guaranteed that it cannot have any vertex properties. As such, the BLVP
> should then just create the vertex properties without checking for their
> existence first - this will be significantly faster.
> * Similarly, when loading edges in the second iteration, it should first
> compute this boolean variable {{requiresIncremental =
> sourceVertex.edges(OUT).hasNext() && outV.edges(OUT).hasNext()}} and then
> only do incremental loading on edges if this variable is true. If it is not
> true incremental loading (i.e. checking for edge existence) isn't necessary.
> Both improvement together should dramatically improve the performance of BLVP
> since it will require a read per edge/vertex property only in those cases
> where a previous job failed. Under "normal" operational conditions it only
> requires one read per vertex per iteration. That is, the reads scale in
> O(|V|) and not O(|E|).
> In addition, there should be an option for IncrementalBulkLoader so that it
> does not attempt to update edges and vertex properties when those already
> exist. In most cases, the edge will be identical when it has been loaded in a
> previous job (since edge and property mutations are atomic in most graph
> databases) and hence this check is unnecessary and being able to make it
> optional can save time.
> Note, that these are important optimizations for large scale graph databases
> where bulk loading is necessary to get started.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)