[
https://issues.apache.org/jira/browse/GIRAPH-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanyuan Tian updated GIRAPH-818:
---------------------------------
Description:
This patch an extension based on my paper, From "Think Like a Vertex" to "Think
Like a Graph", published in PVLDB 2013
(http://researcher.watson.ibm.com/researcher/files/us-ytian/giraph++.pdf).
The basic motivation is as follows. Giraphs divides input graphs into
partitions, and employs a “think like a vertex" programming model to support
iterative graph computation. This vertex-centric model is easy to program and
has been proved useful for many graph algorithms. However, this model hides the
partitioning information from the users, thus prevents many algorithm-specific
optimizations. This often results in longer execution time due to excessive
network messages. To address this limitation, we propose a new “think like a
graph" programming paradigm. Under this graph-centric model, the partition
structure is opened up to the users, and can be utilized so that communication
within a partition can bypass the heavy message passing. For example, on a web
graph with 118 million vertices and 855 million edges, the graph-centric
version of connected component detection algorithm runs 63X faster and uses
204X fewer network messages than its vertex-centric counterpart. For more
details of this new model, please refer to the paper.
Note that since the work is done last year, the extension is based on an old
version of Giraph downloaded in June 2012. So, the patch essential rolls back
all the changes in Giraph since June 2012. Performing a diff with a June 2012
version of Giraph will show what changes I made to support the extension.
To compile the code, simply run “mvn compile”, then the jar file named
giraph-0.2-SNAPSHOT-jar-with-dependencies.jar is generated under the target
directory. Also note that while I was creating the patch and run “mvn clean
verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
In this patch, I also included 4 example algorithms to test the new
graph-centric model. To test-run these examples, please download the two
example graphs: enron.tgz (the enron email graph) and enron_undirected.tgz (an
undirected version of the enron graph). Unzip them and put them on hdfs. Then
these are the command to run the tests:
- compute the NCut amd Imbalande factor of a partitioning scheme
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.NCut enron 7 true 7 output
- weakly connected component on an undirected graph
- the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCGraph enron_undirected out 7 true 7
the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7 true
- the basic vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7 false
- pagerank on a directed graph
- the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRGraph enron output 7 true 7
- the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7 true
- the vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7 false
- graph coarsening on an undirected graph
- the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.coarsen.CoarsenGraph enron_undirected 7 output 7
- the vertex-centricl mdoel
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.coarsen.CoarsenVertex enron_undirected 7 output 7
was:
This patch an extension based on my paper, From "Think Like a Vertex" to "Think
Like a Graph", published in PVLDB 2013
(http://researcher.watson.ibm.com/researcher/files/us-ytian/giraph++.pdf).
The basic motivation is as follows. Giraphs divides input graphs into
partitions, and employs a “think like a vertex" programming model to support
iterative graph computation. This vertex-centric model is easy to program and
has been proved useful for many graph algorithms. However, this model hides the
partitioning information from the users, thus prevents many algorithm-specific
optimizations. This often results in longer execution time due to excessive
network messages. To address this limitation, we propose a new “think like a
graph" programming paradigm. Under this graph-centric model, the partition
structure is opened up to the users, and can be utilized so that communication
within a partition can bypass the heavy message passing. For example, on a web
graph with 118 million vertices and 855 million edges, the graph-centric
version of connected component detection algorithm runs 63X faster and uses
204X fewer network messages than its vertex-centric counterpart. For more
details of this new model, please refer to the paper.
Note that since the work is done last year, the extension is based on an old
version of Giraph downloaded in June 2012. So, the patch essential rolls back
all the changes in Giraph since June 2012. Performing a diff with a June 2012
version of Giraph will show what changes I made to support the extension.
To compile the code, simply run “mvn compile”, then the jar file named
giraph-0.2-SNAPSHOT-jar-with-dependencies.jar is generated under the target
directory. Also note that while I was creating the patch and run “mvn clean
verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
In this patch, I also included 4 example algorithms to test the new
graph-centric model. To test-run these examples, please download the two
example graphs: enron.tgz (the enron email graph) and enron_undirected.tgz (an
undirected version of the enron graph). Unzip them and put them on hdfs. Then
these are the command to run the tests:
# compute the NCut amd Imbalande factor of a partitioning scheme
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.NCut enron 7 true 7 output
# weakly connected component on an undirected graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCGraph enron_undirected out 7 true 7
# the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7 true
# the basic vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7 false
# pagerank on a directed graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRGraph enron output 7 true 7
# the hybrid model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7 true
# the vertex-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7 false
# graph coarsening on an undirected graph
# the graph-centric model
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.coarsen.CoarsenGraph enron_undirected 7 output 7
# the vertex-centricl mdoel
hadoop-0.20.203.0/bin/hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
com.ibm.giraph.graph.example.coarsen.CoarsenVertex enron_undirected 7 output 7
> Extending Giraph with a Graph-Centric Programming API
> -----------------------------------------------------
>
> Key: GIRAPH-818
> URL: https://issues.apache.org/jira/browse/GIRAPH-818
> Project: Giraph
> Issue Type: Improvement
> Components: benchmark
> Affects Versions: 1.1.0
> Reporter: Yuanyuan Tian
> Labels: patch
> Fix For: 1.1.0
>
> Attachments: GIRAPH-798.patch, enron.tgz, enron_undirected.tgz
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> This patch an extension based on my paper, From "Think Like a Vertex" to
> "Think Like a Graph", published in PVLDB 2013
> (http://researcher.watson.ibm.com/researcher/files/us-ytian/giraph++.pdf).
> The basic motivation is as follows. Giraphs divides input graphs into
> partitions, and employs a “think like a vertex" programming model to support
> iterative graph computation. This vertex-centric model is easy to program and
> has been proved useful for many graph algorithms. However, this model hides
> the partitioning information from the users, thus prevents many
> algorithm-specific optimizations. This often results in longer execution time
> due to excessive network messages. To address this limitation, we propose a
> new “think like a graph" programming paradigm. Under this graph-centric
> model, the partition structure is opened up to the users, and can be utilized
> so that communication within a partition can bypass the heavy message
> passing. For example, on a web graph with 118 million vertices and 855
> million edges, the graph-centric version of connected component detection
> algorithm runs 63X faster and uses 204X fewer network messages than its
> vertex-centric counterpart. For more details of this new model, please refer
> to the paper.
> Note that since the work is done last year, the extension is based on an old
> version of Giraph downloaded in June 2012. So, the patch essential rolls back
> all the changes in Giraph since June 2012. Performing a diff with a June 2012
> version of Giraph will show what changes I made to support the extension.
> To compile the code, simply run “mvn compile”, then the jar file named
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar is generated under the target
> directory. Also note that while I was creating the patch and run “mvn clean
> verify”, testBspPageRank and testBspPageRankWithAggregatorWriter failed.
> In this patch, I also included 4 example algorithms to test the new
> graph-centric model. To test-run these examples, please download the two
> example graphs: enron.tgz (the enron email graph) and enron_undirected.tgz
> (an undirected version of the enron graph). Unzip them and put them on hdfs.
> Then these are the command to run the tests:
> - compute the NCut amd Imbalande factor of a partitioning scheme
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.NCut enron 7 true 7 output
> - weakly connected component on an undirected graph
> - the graph-centric model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.wcc.WCCGraph enron_undirected out 7 true 7
> the hybrid model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7 true
> - the basic vertex-centric model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.wcc.WCCVertex enron_undirected out 7 true 7
> false
> - pagerank on a directed graph
> - the graph-centric model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.pagerank.DeltaPRGraph enron output 7 true 7
> - the hybrid model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7 true
> - the vertex-centric model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.pagerank.DeltaPRVertex enron output 7 true 7
> false
> - graph coarsening on an undirected graph
> - the graph-centric model
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.coarsen.CoarsenGraph enron_undirected 7 output 7
> - the vertex-centricl mdoel
> hadoop-0.20.203.0/bin/hadoop jar
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> com.ibm.giraph.graph.example.coarsen.CoarsenVertex enron_undirected 7 output 7
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)