[ 
https://issues.apache.org/jira/browse/GIRAPH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Fonseca updated GIRAPH-811:
-------------------------------------

    Description: 
While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT 
compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never 
stop even after recognizing that all supersteps had completed and the output 
had been written to the output directory.

Looking at the logs, I found that the BspServiceMaster is stuck at the while 
loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729):

{code}2013-12-08 03:51:21,698 INFO  [org.apache.giraph.master.MasterThread] 
master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination 
of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is 
now on superstep 4
2013-12-08 03:51:21,699 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - setJobState: 
{"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
superstep 4
2013-12-08 03:51:21,753 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: 
Notifying master its okay to cleanup with 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master
2013-12-08 03:51:21,790 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - 
cleanUpZooKeeper: Node 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already 
exists, no need to create.
2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max 
workers = 1, split master/worker = true, is YARN-only job = true, total max 
tasks = 1
2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - 
cleanUpZooKeeper: Got 2 of 1 desired children from 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir
2013-12-08 03:51:21,793 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - 
cleanedUpZooKeeper: Waiting for the children of 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to change 
since only got 2 nodes.{code}

As the last 2 entries show, instead of registering just 1 task ending, it 
registers 2 and thus it misses the condition on line 1740.

One solution would be to change the == in line 1740 to a >=. However, the 
actual issue seems to reside with the BspInputFormat.getMaxTasks() 
(BspInputFormat.java:51). This function assumes that in a pure yarn execution 
the total number of tasks will be equal to the maximum number of workers. 
However, based on GiraphApplicationMaster:167, this is not the case. An extra 
Master task is launched in addition to all the Worker tasks. 
BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of a 
pure yarn execution.

Compilation:
{code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code}

Execution command:
{code}$HADOOP_PREFIX/bin/hadoop jar 
~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
giraph/input/tiny_graph.txt -vof 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj 
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code}

  was:
While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT 
compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never 
stop even after recognizing that all supersteps had completed and the output 
had been written to the output directory.

Looking at the logs, I found that the BspServiceMaster is stuck at the end of 
the while loop of cleanrUpZooKeeper() (BspServiceMaster.java:1729):

{code}2013-12-08 03:51:21,698 INFO  [org.apache.giraph.master.MasterThread] 
master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination 
of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is 
now on superstep 4
2013-12-08 03:51:21,699 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - setJobState: 
{"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
superstep 4
2013-12-08 03:51:21,753 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: 
Notifying master its okay to cleanup with 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master
2013-12-08 03:51:21,790 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - 
cleanUpZooKeeper: Node 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already 
exists, no need to create.
2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max 
workers = 1, split master/worker = true, is YARN-only job = true, total max 
tasks = 1
2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - 
cleanUpZooKeeper: Got 2 of 1 desired children from 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir
2013-12-08 03:51:21,793 INFO  [org.apache.giraph.master.MasterThread] 
master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - 
cleanedUpZooKeeper: Waiting for the children of 
/_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to change 
since only got 2 nodes.{code}

As the last 2 entries show, instead of registering just 1 task ending, it 
registers 2 and thus it misses the condition on line 1740.

One solution would be to change the == in line 1740 to a >=. However, the 
actual issue seems to reside with the BspInputFormat.getMaxTasks() 
(BspInputFormat.java:51). This function assumes that in a pure yarn execution 
the total number of tasks will be equal to the maximum number of workers. 
However, based on GiraphApplicationMaster:167, this is not the case. An extra 
Master task is launched in addition to all the Worker tasks. 
BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of a 
pure yarn execution.

Compilation:
{code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code}

Execution command:
{code}$HADOOP_PREFIX/bin/hadoop jar 
~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner 
org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
giraph/input/tiny_graph.txt -vof 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj 
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code}


> Infinite ZooKeeper CleanUp
> --------------------------
>
>                 Key: GIRAPH-811
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-811
>             Project: Giraph
>          Issue Type: Bug
>          Components: bsp, zookeeper
>    Affects Versions: 1.1.0
>            Reporter: Alexandre Fonseca
>              Labels: yarn
>         Attachments: GIRAPH-811.patch
>
>
> While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT 
> compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never 
> stop even after recognizing that all supersteps had completed and the output 
> had been written to the output directory.
> Looking at the logs, I found that the BspServiceMaster is stuck at the while 
> loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729):
> {code}2013-12-08 03:51:21,698 INFO  [org.apache.giraph.master.MasterThread] 
> master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination 
> of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is 
> now on superstep 4
> 2013-12-08 03:51:21,699 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - 
> setJobState: 
> {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
> superstep 4
> 2013-12-08 03:51:21,753 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: 
> Notifying master its okay to cleanup with 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master
> 2013-12-08 03:51:21,790 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - 
> cleanUpZooKeeper: Node 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already 
> exists, no need to create.
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max 
> workers = 1, split master/worker = true, is YARN-only job = true, total max 
> tasks = 1
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - 
> cleanUpZooKeeper: Got 2 of 1 desired children from 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir
> 2013-12-08 03:51:21,793 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - 
> cleanedUpZooKeeper: Waiting for the children of 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to 
> change since only got 2 nodes.{code}
> As the last 2 entries show, instead of registering just 1 task ending, it 
> registers 2 and thus it misses the condition on line 1740.
> One solution would be to change the == in line 1740 to a >=. However, the 
> actual issue seems to reside with the BspInputFormat.getMaxTasks() 
> (BspInputFormat.java:51). This function assumes that in a pure yarn execution 
> the total number of tasks will be equal to the maximum number of workers. 
> However, based on GiraphApplicationMaster:167, this is not the case. An extra 
> Master task is launched in addition to all the Worker tasks. 
> BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of 
> a pure yarn execution.
> Compilation:
> {code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code}
> Execution command:
> {code}$HADOOP_PREFIX/bin/hadoop jar 
> ~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner 
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
> giraph/input/tiny_graph.txt -vof 
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
> giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj 
> giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to