Hi to all, I have been taking a look to Giraph's source code. I have noticed the heavy usage of Writables in it and, even though I don't know many of the details of the project, I think it would be a good idea to at least consider the usage of Pangool instead of the Java Hadoop API.
Pangool (http://pangool.net) is a low-level Java API on top of Hadoop that aims to make several things easier, one of them is dealing with compound types. Most of the others don't apply to Giraph since you are doing Map-Only jobs. The most interesting part of it for Giraph is that you would be able to have a Vertexs with Java classes (Integer, Float, ... or arbitrary serializable Objects) without needing to worry them being Writable. This would reduce some of the code and complexity of the project and it would allow for a more expressive, decoupled from Hadoop code where user functions (business logic) operate directly on Java types rather than on Hadoop types. Pangool has been designed for performance so it should perform in the same order than plain Hadoop (we did a benchmark to show that). Pangool uses Avro for persisting data. It is being used in production in some of our consulting projects (datasalt.com) successfully so we contribute actively to it. So, if this could be interesting at all I will be glad to submit a proposal in a patch and contribute. It will be a win-win situation where Pangool will benefit a lot from being actively used by a serious open-source project like Giraph. Of course, many details will need to be discussed. Take this as a preliminar suggestion just to see how it sounds. Feel free to ask any questions or concerns you may have. Thanks, Pere.
