> It would be good to present users a couple of non trivial examples and one
> two 'real' use cases where Apache Giraph is used for processing large
> Apache Giraph comes with two examples: all shortest paths from a single
> and PageRank. Google's Pregel paper describes 'bipartite matching' and
> 'semi-clustering'. Is anyone working on implementing these in Giraph?
> Or, what if in the shortest paths example you actually want to know the
I have some toy code (not really well tested) that implements b-matching
(that is matching with integer capacities on the nodes).
It's a simple greedy method, along the lines of the one described here
I can share it if you are interested.
It would be great to have examples on more advanced features: custom
> partitioning functions, aggregators, ...
> Personally, I'd like to see a side-by-side comparison of Google's Pregel as
> described in their paper and Giraph implementation (I am particularly
> on where they diverge and why).
> Another question (or thing I am not so sure about) is about 'capacity
> (sort of...). Given a dataset and an algorithm implemented in Giraph, how
> determine how many workers would be needed (in order to fit all your graph
> messages for each superstep in RAM)?
> Last but not least, it seems to me that PageRank is what you use to
> Giraph, is that the case? If that is the case, sharing a common dataset for
> others to use would be a first initial step to allow people to compare
> performances of different software running the very same algorithm, over
> same data and the same hardware infrastructure.
> Sebastian Schelter wrote:
> > Hi,
> > I will give a talk titled "Large Scale Graph Processing with Apache
> > Giraph" in Berlin on May 29th. Details are available at:
> > Best,
> > Sebastian