by the way, about talks/presentations, here are the Apache Giraph
talks/presentations I found:
“Giraph: Large-scale graph processing on Hadoop”, Avery Ching
Hadoop Summit 2011 - Santa Clara, California - June 2011
“Apache Giraph: Distributed Graph Processing in the Cloud”, Claudio Martella
FOSDEM 2012 - Brussels, Belgium - February 2012
“Introducing Apache Giraph for Large Scale Graph Processing”, Sebastian Schelter
Apache Hadoop Get Together - Berlin, Germany - April 2012
You could put the links on the Apache Giraph wiki.
First of all, thank you for sharing them and may I add a few comments or
suggestions for future presentations? (don't take this as a critic, please)...
It would be good to present users a couple of non trivial examples and one or
two 'real' use cases where Apache Giraph is used for processing large graphs.
Apache Giraph comes with two examples: all shortest paths from a single source
and PageRank. Google's Pregel paper describes 'bipartite matching' and
'semi-clustering'. Is anyone working on implementing these in Giraph?
Or, what if in the shortest paths example you actually want to know the path?
It would be great to have examples on more advanced features: custom
partitioning functions, aggregators, ...
Personally, I'd like to see a side-by-side comparison of Google's Pregel as
described in their paper and Giraph implementation (I am particularly interested
on where they diverge and why).
Another question (or thing I am not so sure about) is about 'capacity planning'
(sort of...). Given a dataset and an algorithm implemented in Giraph, how you
determine how many workers would be needed (in order to fit all your graph and
messages for each superstep in RAM)?
Last but not least, it seems to me that PageRank is what you use to 'benchmark'
Giraph, is that the case? If that is the case, sharing a common dataset for
others to use would be a first initial step to allow people to compare
performances of different software running the very same algorithm, over the
same data and the same hardware infrastructure.
Sebastian Schelter wrote:
> I will give a talk titled "Large Scale Graph Processing with Apache
> Giraph" in Berlin on May 29th. Details are available at: