Sounds good to me. I would use EdgeListVertex as the parent class
instead of HashMapVertex (saves memory).
On 3/22/12 12:39 PM, Jakob Homan wrote:
This is a great idea. Let's make it happen!
On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann
after my experiences with giraph and hadoop in the last weeks, I would strongly
suggest that a maven archetype for a simple giraph job
should be made available for new developers.
Figuring out how to change the provided giraph examples, in order to make them
error free in an IDE,
and then how to run a unit test and a InternalVertexRunner is manageable.
However deploying that same code to a real hadoop cluster can be very time
consuming and frustrating.
There is a strong chance that a few people from my research unit will also need
to learn about giraph and hadoop,
and providing a maven archetype is the way in which I would document my
experiences for them.
For that archetype I would suggest the following contents:
* pom.xml which has dependencies to hadoop, and which specifies the assembly
instructions for a jar that hadoop can use
(not ./lib as everybody on the web says, but unpcked jars in / )
* empty vertex class which is a subclass of HashMapVertex (with comments to
explain that other classes like BasicVertex should never be subclassed by the
* empty TextInputFormat
* empty TextOutputFormat
* empty class with run() and ToolRunner invocation, and comments to explain
that this is an alternative to bin/giraph, and how to use bin/giraph for the
(also explain the more advanced things which a custom run() can do)
* make sure that all classes can be called through bin/giraph as well (and
debug GiraphRunner if there still is some error)
* empty Test class using internalvertexrunner
* everything should be able to run via the Test, the ToolRunner or bin/giraph
just without doing anything.
I also consider this a good opportunity to learn about the best practices of
and I think that I can probably work on that archetype in April.
The archetype would be based on a cleaned up and domain/use-case agnostic
version of my code which is currently here:
I am not sure how that would be distributed, probably using the same
which is required for distributing an giraph maven artefact to the apache maven
Please let me know if you as the giraph community thinks this is a good idea,
and if you have additions and/or changes to what should go inside of the