[
https://issues.apache.org/jira/browse/TINKERPOP-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643168#comment-17643168
]
ASF GitHub Bot commented on TINKERPOP-2834:
-------------------------------------------
ministat opened a new pull request, #1885:
URL: https://github.com/apache/tinkerpop/pull/1885
The current CloneVertexProgram does nothing in its execute method, and the
SparkGraphComputer needs to run general VertexProgram which requires a shuffle
stage, which can be removed. Here a shortcut is implemented. When I exported
two big graph, the overall exporting time was improved a lot. See the following
table.
```
-----------------------------
|Graph 1 |Graph 2
-----------------------------
Before fix |3.6h |22min
-----------------------------
After fix |2.4h |16min
```
Graph 1 has 15 billion vertice and 23 billion edges. Graph 2 has 130 million
vertices and 650 million edges.
> CloneVertexProgram optimization on SparkGraphComputer
> -----------------------------------------------------
>
> Key: TINKERPOP-2834
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2834
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Reporter: Redriver
> Priority: Major
>
> The CloneVertexProgram does nothing in its execute() method, but in
> SparkGraphComputer it has to process as standard GraphComputer semantics,
> which takes many unnecessary computation. In fact, registering a special
> SparkVertexProgramInterceptor with empty apply() can improve the overall
> performance a lot.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)