FOSDEM has announced a devroom completely dedicated to Graph Processing:
I'm going to submit for a talk there. Here's the draft, feedback is welcome :)
Title: "Apache Giraph: distributed graph processing in the cloud."
Abstract: Web and online social graphs have been rapidly growing in
size and scale during the past decade. In 2008, Google estimated that
the number of web pages reached over a trillion. Online social
networking and email sites, including Yahoo!, Google, Microsoft,
Facebook, LinkedIn, and Twitter, have hundreds of millions of users
and are expected to grow much more in the future. Processing these
graphs plays a big role in relevant and personalized information for
users, such as results from a search engine or news in an online
social networking site.
The Apache Giraph (http://incubator.apache.org/giraph) project is a
faul-tolerant in-memory distributed graph processing system which runs
on top of a standard Hadoop cluster and is capable of running any
standard Bulk Synchronous Parallel (BSP) operation over any large
generic data set which can be represented as a graph. Apache Giraph is
a loose implementation of Google Pregel.
Giraph entered the ASF Incubator in July 2011, where it has enlisted
the aid of committers from Yahoo!, Facebook, LinkedIn, and Twitter.
The talk will present why running MapReduce jobs for graph processing
can be a problem, introducing the reason why Google designed Pregel
at first place. Later, the BSP model will be presented focusing on how
it can be used to implement a distributed graph processing engine.
The last part of the talk will be dedicated to Apache Giraph, with a
description of the programming model (i.e. the API, some typical
examples such as PageRank and Single Source Shortest Path) along with
a technical overview of how the architecture of Giraph works and how
it leverages the Hadoop infrastructure.