-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9811/
-----------------------------------------------------------
(Updated March 16, 2013, 7:22 a.m.)
Review request for giraph.
Changes
-------
Hey guys. fixes a small bug involving the integration test's MiniYARNCluster
borrowing same-named test dirs from running test instances of
InternalVertexRunner including port collisions on 22181 and 22182. All fixed.
Ran a bunch of times, seems to be working well now.
The command line I gave in the last RB post here is a bit bonkers. Here's an
example I used today to run connected components on a 3 million V synthetic
graph. It took about 52 seconds on average:
{code}
# from your giraph source tree, assuming Hadoop-2.0.3-alpha is up and you can
run wordcount on it.
# Your Hadoop cluster can be a local singlenode or real.
mvn -Phadoop_yarn clean package
cp giraph-examples/target/giraph-examples*.jar ~/hadoop/share/hadoop/giraph/
cd ~/hadoop
# this will instantiate 5 YARN container processes: 1 Application Master to
manage the job. 1 Master node, and 3 Worker nodes.
# As always, you pick the # of workers with -w option, but here we always keep
in mind there will be 2 more processes: the master and the app master.
# Remember this if the cluster fails due to lack of memory -- you need to
allocate just enough workers that you have left for a master with the same
amount of
# heap in -yh option, AND an app master running on a gig of memory too.
hadoop --config etc/hadoop jar
share/hadoop/giraph/giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.ConnectedComponentsVertex -w 3 -yj
giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
-yh 1024 -vif org.apache.giraph.io.formats.IntIntNullIntTextInputFormat -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -vip graph3mil -op
demoOutput8
{code}
Description
-------
Port Giraph to "pure YARN" clusters, using Hadoop MapReduce classes in our code
(IO formats etc.) but running the cluster job without any active participation
by a running MapReduce framework. This means doing some things ourselves that
Hadoop used to do for us.
I am putting this up for review to aid some non-Giraphers in having a peek at
the YARN component. There is a bit of latency in the job launch that I am still
diagnosing. I am also still finishing up an integration test to verify the YARN
components can run a no-op Giraph job successfully. All BSP code is covered by
our MRv1 tests, which are sufficient since once Giraph is running, it does not
know or care if its running on YARN. The grand total is TWO files with FOUR
actual munges, total for the entire patch. All the rest is conditionally
compiled and/or manipulated through conf settings without ever calling into
YARN-specific code from inside Giraph. This will allow us to wait on ripping
apart our IO formats or other MRv1 baked-in dependencies before we're ready to
abandon MR. This also sets up a paradigm by which it will be easy to port us to
other cluster frameworks (Mesos, etc.)
I will ping Giraph folks when this is really ready for review (hopefully next
day or so) but feel free to drop me a line now if you see something you are
curious about or just plain don't like. The sooner I fix it, the sooner this
gets committed, so please speak up if you do.
My goal is to make this not only our port of YARN, but another (there aren't
many) good and well-commented example of how to run "real applications" like
Giraph on YARN clusters. So I'm hoping its clear and easy to follow on that
level as well. Happy to hear feedback on that angle as well!
Thanks! Will post a wiki page explaining a bit more about this when its all
finished. This version is still depending on Hadoop-2.0.3-alpha, but I will
attempt to back port to 2.0.2 before I'm done, and a future JIRA should bring
us to 2.0.0 or higher (and trunk of course.)
Diffs (updated)
-----
checkstyle.xml 3d8a6d4
giraph-core/pom.xml 3580d0c
giraph-core/src/main/java/org/apache/giraph/GiraphRunner.java 5bd5686
giraph-core/src/main/java/org/apache/giraph/bsp/BspInputFormat.java bce84b1
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
6886d58
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ad9073d
giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java
e74c59a
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java
87497b8
giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java
41238d0
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
74c1f87
giraph-core/src/main/java/org/apache/giraph/yarn/GiraphApplicationMaster.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnClient.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java
PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/yarn/YarnUtils.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/yarn/package-info.java
PRE-CREATION
giraph-core/src/test/java/org/apache/giraph/yarn/TestYarnJob.java
PRE-CREATION
giraph-core/src/test/resources/capacity-scheduler.xml PRE-CREATION
giraph-examples/pom.xml 3b6a08c
pom.xml 8d29304
Diff: https://reviews.apache.org/r/9811/diff/
Testing
-------
Getting there, in-progress integration test is included for your amusment.
Thanks,
Eli Reisman