[ 
https://issues.apache.org/jira/browse/SPARK-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Bertelsen updated SPARK-9937:
------------------------------------
    Attachment: Scaleservers-lin.png

> GraphX Performance: Partition overhead scales quadratically
> -----------------------------------------------------------
>
>                 Key: SPARK-9937
>                 URL: https://issues.apache.org/jira/browse/SPARK-9937
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Tobias Bertelsen
>         Attachments: Scaleservers-lin.png
>
>
> Hello everybody, or particularly Graph X developers.
> I working on an algorithm that combines normal RDD operations and graph 
> operations. When I tested the parallelizability I discovered that when I 
> added more worker nodes most stages would run faster, but my graph operations 
> would run slower.
> More specifically with twice the number of servers the graph operations would 
> take twice as long, indicating that the amount of work increased fourfold. I 
> created a plot of the runtime for different number of servers, which  I have 
> attached.
> The graph operations are called clustering in the plot.
> I tried to look into the code and I think I found something that might be the 
> problem.
> The operations shipVertexAttributes and shipVertexIds in VertexRDDImpl seems 
> to be generating RDD's that contains an element for every combination of 
> vertex partition and edge partition, even if there are no connection between 
> the two.
> The result is that the overhead time ends up dominating the computation time.
> I am not familiar with the design and code base for Graph X. Perhaps there 
> are more of problems of this kind which causes parallelization problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to