Hello, after my experiences with giraph and hadoop in the last weeks, I would strongly suggest that a maven archetype for a simple giraph job should be made available for new developers.
Figuring out how to change the provided giraph examples, in order to make them error free in an IDE, and then how to run a unit test and a InternalVertexRunner is manageable. However deploying that same code to a real hadoop cluster can be very time consuming and frustrating. There is a strong chance that a few people from my research unit will also need to learn about giraph and hadoop, and providing a maven archetype is the way in which I would document my experiences for them. For that archetype I would suggest the following contents: * pom.xml which has dependencies to hadoop, and which specifies the assembly instructions for a jar that hadoop can use (not ./lib as everybody on the web says, but unpcked jars in / ) * empty vertex class which is a subclass of HashMapVertex (with comments to explain that other classes like BasicVertex should never be subclassed by the user) * empty TextInputFormat * empty TextOutputFormat * empty class with run() and ToolRunner invocation, and comments to explain that this is an alternative to bin/giraph, and how to use bin/giraph for the same effect (also explain the more advanced things which a custom run() can do) * make sure that all classes can be called through bin/giraph as well (and debug GiraphRunner if there still is some error) * empty Test class using internalvertexrunner * everything should be able to run via the Test, the ToolRunner or bin/giraph just without doing anything. I also consider this a good opportunity to learn about the best practices of using giraph, and I think that I can probably work on that archetype in April. The archetype would be based on a cleaned up and domain/use-case agnostic version of my code which is currently here: https://github.com/2nd-metaman/sa-rdf-giraph I am not sure how that would be distributed, probably using the same infrastructure which is required for distributing an giraph maven artefact to the apache maven servers anyway. Please let me know if you as the giraph community thinks this is a good idea, and if you have additions and/or changes to what should go inside of the archetype. cheers, Benjamin.