Repository: giraph
Updated Branches:
refs/heads/trunk d827c97fc -> 2185f5946
GIRAPH-1076 Race condition in FileTxnSnapLog
Summary:
org.apache.zookeeper.server.persistence.FileTxnSnapLog has a potential for race
condition:
if (!this.dataDir.exists()) {
if (!this.dataDir.mkdirs()) {
throw new IOException("Unable to create data directory " +
this.dataDir);
}
}
If two threads try to create FileTxnSnapLog simultaneously it can trigger
IOException.
We saw this happening in Giraph where FileTxnSnapLog is being created by
PurgeTask created by DatadirCleanupManager and by
InProcessZooKeeperRunner#runFromConfig.
Until and if ever, the zookeeper code is fixed, we need to make sure zookeeper
starts first and only then starts PurgeTask.
Test Plan: run a few jobs and mvn clean verify
Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo
Reviewed By: maja.kabiljo
Differential Revision: https://reviews.facebook.net/D59883
Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/2185f594
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/2185f594
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/2185f594
Branch: refs/heads/trunk
Commit: 2185f5946edfddcca8a5bcb76160212bfe2ef797
Parents: d827c97
Author: Sergey Edunov <[email protected]>
Authored: Tue Jun 21 10:14:34 2016 -0700
Committer: Sergey Edunov <[email protected]>
Committed: Tue Jun 21 10:14:34 2016 -0700
----------------------------------------------------------------------
.../org/apache/giraph/zk/InProcessZooKeeperRunner.java | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/giraph/blob/2185f594/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
----------------------------------------------------------------------
diff --git
a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
index 9502c24..4f15f3a 100644
---
a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
+++
b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
@@ -88,16 +88,22 @@ public class InProcessZooKeeperRunner
* @throws IOException if can't start zookeeper
*/
public int start(ZookeeperConfig config) throws IOException {
+ serverRunner = new ZooKeeperServerRunner();
+ //Make sure zookeeper starts first and purge manager last
+ //This is important because zookeeper creates a folder
+ //strucutre on the local disk. Purge manager also tries
+ //to create it but from a different thread and can run into
+ //race condition. See FileTxnSnapLog source code for details.
+ int port = serverRunner.start(config);
// Start and schedule the the purge task
DatadirCleanupManager purgeMgr = new DatadirCleanupManager(
config
- .getDataDir(), config.getDataLogDir(),
+ .getDataDir(), config.getDataLogDir(),
GiraphConstants.ZOOKEEPER_SNAP_RETAIN_COUNT,
GiraphConstants.ZOOKEEPER_PURGE_INTERVAL);
purgeMgr.start();
- serverRunner = new ZooKeeperServerRunner();
- return serverRunner.start(config);
+ return port;
}
/**