[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123961#comment-15123961
 ] 

Chris Nauroth commented on ZOOKEEPER-1936:
------------------------------------------

It's a tough call then.  I don't see a way to write a deterministic JUnit test 
to prove the fix.  I'm reluctant to accept a code change without a test or a 
manual verification, at least on a stable maintenance line.

Here is my take on it.  Even without a consistent repro, I see the theoretical 
problem in the code.  The logic change in the patch looks correct to me, even 
if there might have been something more happening when Ted reported it.  Let's 
put a fix into trunk and branch-3.5, but not branch-3.4.  In trunk and 
branch-3.5, we also can make the switch to 
[{{Files#createDirectories}}|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createDirectories(java.nio.file.Path,%20java.nio.file.attribute.FileAttribute...)]
 that I mentioned earlier, because those branches are compiling to JDK 7.  That 
way, if we see another repro, we'll get additional debugging information to 
help with any subsequent patches.

Would some other committers like to comment on that plan?  Thanks!

> Server exits when unable to create data directory due to race 
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1936
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6, 3.5.0
>            Reporter: Harald Musum
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -               
> .org.apache.zookeeper.server.ZooKeeperServerMain    Unexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x000000000201d000 nid=0x1727 runnable
> [0x00007f55d7dc7000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.createDirectory(Native Method)
>     at java.io.File.mkdir(File.java:1310)
>     at java.io.File.mkdirs(File.java:1337)
>     at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
>     at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
>     at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
>     at java.util.TimerThread.mainLoop(Timer.java:555)
>     at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x00000000027df800 nid=0x1715 runnable
> [0x00007f55d7ed8000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.createDirectory(Native Method)
>     at java.io.File.mkdir(File.java:1310)
>     at java.io.File.mkdirs(File.java:1337)
>     at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>     at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
>     at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to