Re: Scalability results for GoldenOrb and comparison with Giraph

Avery Ching Sun, 11 Dec 2011 11:02:52 -0800

Hi Jon,

[email protected] (so as to not clog up their mailing listuninvited)

First of all, thank you for sharing this comparison. I would like tonote a few things. The results I posted in October 2011 were actually abit old (done in June 2011) and do not have several improvements thatreduce memory usage significantly (i.e. GIRAPH-12 and GIRAPH-91). Thenumber of vertices loadable per worker is highly dependent on the numberof edges per worker, the amount of available heap memory, number ofmessages, the balancing of the graph across the workers, etc. In recenttests at Facebook, I have been able to load over 10 million vertices /worker easily with 20 edges / vertex. I know that you wrote that themaximum per worker was at least 1.6 million vertices for Giraph, I justwanted to let folks know that it's in fact much higher. We'll work oncontinuing to improve that in the future as today's graph problems arein the billions of vertices or rather hundreds of billions =).

Also, with respect to scalability, if I'm interpreting these resultscorrectly, does it mean that GoldenOrb is currently unable to load morethan 250k vertices / cluster as observed by former Ravel developers? ifso, given the small tests and overhead per superstep, I wouldn't expectthe scalability to be much improved by more workers. Also, the maxvalue and shortest paths algorithms are highly data dependent to howmany messages are passed around per superstep and perhaps not a fairscaling comparison with Giraph's scalability designed page rankbenchmark test (equal messages per superstep distributed evenly acrossvertices). Would be nice to see an apples-to-apples comparison ifsomeone has the time...=)


Thanks,

Avery

On 12/10/11 3:16 PM, Jon Allen wrote:

Since GoldenOrb was released this past summer, a number of people have asked 
questions regarding scalability and performance testing, as well as a 
comparison of these results with those of Giraph ( 
http://incubator.apache.org/giraph/ ), so I went forward with running tests to 
help answer some of these questions.

A full report of the scalability testing results, along with methodology 
details, relevant information regarding testing and analysis, links to data 
points for Pregel and Giraph, scalability testing references, and background 
mathematics, can be found here:

http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/

Since this data will also be of interest to the Giraph community (for 
methodology, background references, and analysis reasons), I am cross posting 
to the Giraph user mailing list.

A synopsis of the scalability results for GoldenOrb, and comparison data points 
for Giraph and Google's Pregel framework are provided below.

The setup and execution of GoldenOrb scalability tests were conducted by three 
former Ravel (http://www.raveldata.com ) developers, including myself, with 
extensive knowledge of the GoldenOrb code base and optimal system 
configurations, ensuring the most optimal settings were used for scalability 
testing.


RESULTS SUMMARY:


MAX CAPACITY:

Pregel (at least): 166,666,667 vertices per node.

Giraph (at least): 1,666,667 vertices per worker.

GoldenOrb: ~ 100,000 vertices per node, 33,333 vertices per worker.


STRONG SCALING (SSSP):
Note: Optimal parallelization corresponds to the minimum value -1.0. Deviation 
from the minimum possible value of -1.0 corresponds to non-optimal 
parallelization.

Pregel: -0.924 (1 billion total vertices)

Giraph: -0.934 (250 Million total vertices)

GoldenOrb: -0.031 Average, -0.631 Best (100000 total vertices), 0.020 Worst 
(1000 total vertices)


WEAK SCALING (SSSP):
Note: Optimal weak scalability corresponds to the value 0.0. Deviation from the 
optimal value of 0.0, corresponds to non-optimal usage of computational 
resources as managed by the framework.

Pregel: No Data Available

Giraph: 0.01 (1,666,667 vertices per worker)

GoldenOrb: 0.37 Average, 0.23 Best (500 vertices per node), 0.48 Worst (12500 
vertices per node)



I hope this answers some of the many questions which have been posted regarding 
scalability and performance. Be sure to check out the full scalability testing 
report at http://wwwrel.ph.utexas.edu/Members/jon/golden_orb/  Please let me 
know if you have any questions.

Thanks,
Jon

Re: Scalability results for GoldenOrb and comparison with Giraph

Reply via email to