This is a great idea.  Let's make it happen!
-jg

On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann
<benjamin.heitm...@deri.org> wrote:
> Hello,
>
> after my experiences with giraph and hadoop in the last weeks, I would 
> strongly suggest that a maven archetype for a simple giraph job
> should be made available for new developers.
>
> Figuring out how to change the provided giraph examples, in order to make 
> them error free in an IDE,
> and then how to run a unit test and a InternalVertexRunner is manageable.
>
> However deploying that same code to a real hadoop cluster can be very time 
> consuming and frustrating.
>
> There is a strong chance that a few people from my research unit will also 
> need to learn about giraph and hadoop,
> and providing a maven archetype  is the way in which I would document my 
> experiences for them.
>
>
> For that archetype I would suggest the following contents:
> * pom.xml which has dependencies to hadoop, and which specifies the assembly 
> instructions for a jar that hadoop can use
> (not ./lib as everybody on the web says, but unpcked jars in / )
> * empty vertex class which is a subclass of HashMapVertex (with comments to 
> explain that other classes like BasicVertex should never be subclassed by the 
> user)
> * empty TextInputFormat
> * empty TextOutputFormat
> * empty class with run() and ToolRunner invocation, and comments to explain 
> that this is an alternative to bin/giraph, and how to use bin/giraph for the 
> same effect
> (also explain the more advanced things which a custom run() can do)
> * make sure that all classes can be called through bin/giraph as well (and 
> debug GiraphRunner if there still is some error)
> * empty Test class using internalvertexrunner
> * everything should be able to run via the Test, the ToolRunner or bin/giraph 
> just without doing anything.
>
> I also consider this a good opportunity to learn about the best practices of 
> using giraph,
> and I think that I can probably work on that archetype in April.
>
> The archetype would be based on a cleaned up and domain/use-case agnostic 
> version of my code which is currently here:
>  https://github.com/2nd-metaman/sa-rdf-giraph
>
> I am not sure how that would be distributed, probably using the same 
> infrastructure
> which is required for distributing an giraph maven artefact to the apache 
> maven servers anyway.
>
> Please let me know if you as the giraph community thinks this is a good idea,
> and if you have additions and/or changes to what should go inside of the 
> archetype.
>
>
> cheers, Benjamin.

Reply via email to