Jakob Homan updated GIRAPH-64:

    Attachment: GIRAPH-64.patch

Here's a patch that introduces that old bin folder we all know and lo{ve|athe}. 
 This also gives us the start of the package we'll need to think about making 
releases.  Users no longer have to merge their code into the Giraph source to 
get it to run.
With the new bin/giraph, assuming an implementation of Vertex such as (taken 
from the pagerankbenchmark, obviously):
{code}import java.util.Iterator;

public class FirstVertex extends
    Vertex<LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> {
    /** Configuration from Configurable */
    private Configuration conf;

    /** How many supersteps to run */
    public static String SUPERSTEP_COUNT = "PageRankBenchmark.superstepCount";

    public void preApplication()
        throws InstantiationException, IllegalAccessException {

    public void postApplication() {

    public void preSuperstep() {

    public void compute(Iterator<DoubleWritable> msgIterator) {
        if (getSuperstep() >= 1) {
            double sum = 0;
            while (msgIterator.hasNext()) {
                sum += msgIterator.next().get();
            DoubleWritable vertexValue =
                new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum);

        if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) {
            long edges = getNumOutEdges();
            sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / 
        } else {

  public Configuration getConf() {
      return conf;

  public void setConf(Configuration conf) {
      this.conf = conf;

one can run it via:
{noformat}bin/giraph \
-DPageRankBenchmark.superstepCount=30 \
-DpseduoRandomVertexReader.aggregateVertices=220 \
-DpseduoRandomVertexReader.edgesPerVertex=37 \
~/kick-ass-vertex-1.0.jar giraph1.FirstVertex \
-w 10 \
-if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat \
-of org.apache.giraph.lib.JsonBase64VertexOutputFormat \
-op output_path{noformat}
bin/giraph is heavily cribbed from mahout and pig, btw.  
Is there any reason the fatjar approach was taken other than expediency?  This 
patch uses the fatjar approach for testing, but uses a standard lib folder 
approach for the actual package.  I'd like to remove the fatjar entirely, 

This is a rough script and will need lots of enhancements as we go, but I think 
it's a good start.
> Create VertexRunner to make it easier to run users' computations
> ----------------------------------------------------------------
>                 Key: GIRAPH-64
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-64
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-64.patch
> Currently, if a user wants to implement a Giraph algorithm by extending 
> {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
> interface and bundle it with the Giraph jar (or get Giraph on the classpath 
> and playing nice with the implementation).  For example, what is included in 
> the PageRankBenchmark and what Kohei has done: 
> https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
> we had perhaps a Vertex implementation to be subclassed that already had all 
> the standard Tooling included such that all one had to run would be (assuming 
> the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
> jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
> -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
> This wouldn't work with every algorithm, but would be useful in a large 
> number of cases.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to