[
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated ZOOKEEPER-1936:
-------------------------------------
Assignee: Ted Yu
Fix Version/s: 3.5.3
3.4.10
[~fpj], thank you for looking. I am proposing that we fix this for 3.5.3 and
3.4.10. I am proposing separate patches for the 2 branches, so that we can use
the NIO APIs with the richer error reporting in trunk/branch-3.5 and use the
JDK 6 APIs in branch-3.4.
I now realize that the trunk/branch-3.5 patch wasn't ready. Thanks for
pointing out the problem. [[email protected]], would you please update that
to catch the exception from {{Files#createDirectories}} and allow the method to
succeed if the directory already exists? That will keep it similar to the
branch-3.4 logic. Would you please upload new patch files for both
trunk/branch-3.5 and branch-3.4? That will help eliminate the current
confusion about which patch files to use.
Ted has contributed the most recent patches that I am proposing to commit after
another revision, so I'll assign to him.
bq. I'm actually wondering why we added that DatadirException. I'd much rather
just keep it IOException instead.
This traces back to ZOOKEEPER-1161, which introduced the ability to disable
automatic directory creation. Part of that includes special handling of
{{DatadirException}} in {{QuorumPeerMain}} and {{ZooKeeperServerMain}} so that
they can return unique exit codes when directory creation fails. I think we
need to keep this as is for now to preserve backward compatibility.
> Server exits when unable to create data directory due to race
> --------------------------------------------------------------
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.6, 3.5.0
> Reporter: Harald Musum
> Assignee: Ted Yu
> Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch,
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this
> error in the log:
> [2014-05-27 09:29:48.248] ERROR : -
> .org.apache.zookeeper.server.ZooKeeperServerMain Unexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x000000000201d000 nid=0x1727 runnable
> [0x00007f55d7dc7000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x00000000027df800 nid=0x1715 runnable
> [0x00007f55d7ed8000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might
> happen at the same time as starting the server itself. In FileTxnSnapLog() it
> will check if the directory exists and create it if not. These two tasks do
> this at the same time, and mkdir fails and server exits the JVM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)