> It would be good to present users a couple of non trivial examples and one
> or
> two 'real' use cases where Apache Giraph is used for processing large
> graphs.
> Apache Giraph comes with two examples: all shortest paths from a single
> source
> and PageRank. Google's Pregel paper describes 'bipartite matching' and
> 'semi-clustering'. Is anyone working on implementing these in Giraph?
> Or, what if in the shortest paths example you actually want to know the
> path?
I have some toy code (not really well tested) that implements b-matching
(that is matching with integer capacities on the nodes).
It's a simple greedy method, along the lines of the one described here

I can share it if you are interested.


It would be great to have examples on more advanced features: custom
> partitioning functions, aggregators, ...
> Personally, I'd like to see a side-by-side comparison of Google's Pregel as
> described in their paper and Giraph implementation (I am particularly
> interested
> on where they diverge and why).
> Another question (or thing I am not so sure about) is about 'capacity
> planning'
> (sort of...). Given a dataset and an algorithm implemented in Giraph, how
> you
> determine how many workers would be needed (in order to fit all your graph
> and
> messages for each superstep in RAM)?
> Last but not least, it seems to me that PageRank is what you use to
> 'benchmark'
> Giraph, is that the case? If that is the case, sharing a common dataset for
> others to use would be a first initial step to allow people to compare
> performances of different software running the very same algorithm, over
> the
> same data and the same hardware infrastructure.
> Paolo
> Sebastian Schelter wrote:
> > Hi,
> >
> > I will give a talk titled "Large Scale Graph Processing with Apache
> > Giraph" in Berlin on May 29th. Details are available at:
> >
> >
> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275
> >
> > Best,
> > Sebastian

Reply via email to