This is a great idea. Let's make it happen!
On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann
> after my experiences with giraph and hadoop in the last weeks, I would
> strongly suggest that a maven archetype for a simple giraph job
> should be made available for new developers.
> Figuring out how to change the provided giraph examples, in order to make
> them error free in an IDE,
> and then how to run a unit test and a InternalVertexRunner is manageable.
> However deploying that same code to a real hadoop cluster can be very time
> consuming and frustrating.
> There is a strong chance that a few people from my research unit will also
> need to learn about giraph and hadoop,
> and providing a maven archetype is the way in which I would document my
> experiences for them.
> For that archetype I would suggest the following contents:
> * pom.xml which has dependencies to hadoop, and which specifies the assembly
> instructions for a jar that hadoop can use
> (not ./lib as everybody on the web says, but unpcked jars in / )
> * empty vertex class which is a subclass of HashMapVertex (with comments to
> explain that other classes like BasicVertex should never be subclassed by the
> * empty TextInputFormat
> * empty TextOutputFormat
> * empty class with run() and ToolRunner invocation, and comments to explain
> that this is an alternative to bin/giraph, and how to use bin/giraph for the
> same effect
> (also explain the more advanced things which a custom run() can do)
> * make sure that all classes can be called through bin/giraph as well (and
> debug GiraphRunner if there still is some error)
> * empty Test class using internalvertexrunner
> * everything should be able to run via the Test, the ToolRunner or bin/giraph
> just without doing anything.
> I also consider this a good opportunity to learn about the best practices of
> using giraph,
> and I think that I can probably work on that archetype in April.
> The archetype would be based on a cleaned up and domain/use-case agnostic
> version of my code which is currently here:
> I am not sure how that would be distributed, probably using the same
> which is required for distributing an giraph maven artefact to the apache
> maven servers anyway.
> Please let me know if you as the giraph community thinks this is a good idea,
> and if you have additions and/or changes to what should go inside of the
> cheers, Benjamin.