Repository: giraph
Updated Branches:
  refs/heads/trunk d827c97fc -> 2185f5946


GIRAPH-1076 Race condition in FileTxnSnapLog

Summary:
org.apache.zookeeper.server.persistence.FileTxnSnapLog has a potential for race 
condition:

    if (!this.dataDir.exists()) {
        if (!this.dataDir.mkdirs()) {
               throw new IOException("Unable to create data directory " + 
this.dataDir);
        }
    }

If two threads try to create FileTxnSnapLog simultaneously it can trigger 
IOException.
We saw this happening in Giraph where FileTxnSnapLog is being created by 
PurgeTask created by DatadirCleanupManager and by 
InProcessZooKeeperRunner#runFromConfig.
Until and if ever, the zookeeper code is fixed, we need to make sure zookeeper 
starts first and only then starts PurgeTask.

Test Plan: run a few jobs and mvn clean verify

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59883


Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/2185f594
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/2185f594
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/2185f594

Branch: refs/heads/trunk
Commit: 2185f5946edfddcca8a5bcb76160212bfe2ef797
Parents: d827c97
Author: Sergey Edunov <[email protected]>
Authored: Tue Jun 21 10:14:34 2016 -0700
Committer: Sergey Edunov <[email protected]>
Committed: Tue Jun 21 10:14:34 2016 -0700

----------------------------------------------------------------------
 .../org/apache/giraph/zk/InProcessZooKeeperRunner.java  | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/giraph/blob/2185f594/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
----------------------------------------------------------------------
diff --git 
a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java 
b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
index 9502c24..4f15f3a 100644
--- 
a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
+++ 
b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
@@ -88,16 +88,22 @@ public class InProcessZooKeeperRunner
      * @throws IOException if can't start zookeeper
      */
     public int start(ZookeeperConfig config) throws IOException {
+      serverRunner = new ZooKeeperServerRunner();
+      //Make sure zookeeper starts first and purge manager last
+      //This is important because zookeeper creates a folder
+      //strucutre on the local disk. Purge manager also tries
+      //to create it but from a different thread and can run into
+      //race condition. See FileTxnSnapLog source code for details.
+      int port = serverRunner.start(config);
       // Start and schedule the the purge task
       DatadirCleanupManager purgeMgr = new DatadirCleanupManager(
           config
-          .getDataDir(), config.getDataLogDir(),
+              .getDataDir(), config.getDataLogDir(),
           GiraphConstants.ZOOKEEPER_SNAP_RETAIN_COUNT,
           GiraphConstants.ZOOKEEPER_PURGE_INTERVAL);
       purgeMgr.start();
 
-      serverRunner = new ZooKeeperServerRunner();
-      return serverRunner.start(config);
+      return port;
     }
 
     /**

Reply via email to