This is a great idea. Let's make it happen! -jg
On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann <benjamin.heitm...@deri.org> wrote: > Hello, > > after my experiences with giraph and hadoop in the last weeks, I would > strongly suggest that a maven archetype for a simple giraph job > should be made available for new developers. > > Figuring out how to change the provided giraph examples, in order to make > them error free in an IDE, > and then how to run a unit test and a InternalVertexRunner is manageable. > > However deploying that same code to a real hadoop cluster can be very time > consuming and frustrating. > > There is a strong chance that a few people from my research unit will also > need to learn about giraph and hadoop, > and providing a maven archetype is the way in which I would document my > experiences for them. > > > For that archetype I would suggest the following contents: > * pom.xml which has dependencies to hadoop, and which specifies the assembly > instructions for a jar that hadoop can use > (not ./lib as everybody on the web says, but unpcked jars in / ) > * empty vertex class which is a subclass of HashMapVertex (with comments to > explain that other classes like BasicVertex should never be subclassed by the > user) > * empty TextInputFormat > * empty TextOutputFormat > * empty class with run() and ToolRunner invocation, and comments to explain > that this is an alternative to bin/giraph, and how to use bin/giraph for the > same effect > (also explain the more advanced things which a custom run() can do) > * make sure that all classes can be called through bin/giraph as well (and > debug GiraphRunner if there still is some error) > * empty Test class using internalvertexrunner > * everything should be able to run via the Test, the ToolRunner or bin/giraph > just without doing anything. > > I also consider this a good opportunity to learn about the best practices of > using giraph, > and I think that I can probably work on that archetype in April. > > The archetype would be based on a cleaned up and domain/use-case agnostic > version of my code which is currently here: > https://github.com/2nd-metaman/sa-rdf-giraph > > I am not sure how that would be distributed, probably using the same > infrastructure > which is required for distributing an giraph maven artefact to the apache > maven servers anyway. > > Please let me know if you as the giraph community thinks this is a good idea, > and if you have additions and/or changes to what should go inside of the > archetype. > > > cheers, Benjamin.