[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855521#comment-13855521
 ] 

Bruno Freudensprung commented on ZOOKEEPER-1835:
------------------------------------------------

Before explaining the latest patch, let me summarize its content:
- I created a new AtomicFileWritingIdiom...
 - reusing the existing AtomicFileOutputStream class,
 - inspired by the way the  writeLongToFile method of QuorumPeer is using it,
- ... and used this new AtomicFileWritingIdiom class in the writeDynamicConfig 
method of QuorumPeerConfig

It took me some time to understand what was going wrong in QuorumPeerMainTest 
and ObserverHierarchicalQuorumTest unit tests with my second patch.
During my investigations I discovered strange stack traces in the junit reports 
of the (unpatched) trunk, related to AtomicFileOutputStream :
{code}
java.nio.channels.ClosedByInterruptException
        at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
        at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1499)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1520)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:523)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:980)
{code}
They occur in:
{code}
TEST-org.apache.zookeeper.server.quorum.Zab1_0Test.txt
TEST-org.apache.zookeeper.test.ObserverHierarchicalQuorumTest.txt
TEST-org.apache.zookeeper.test.ReadOnlyModeTest.txt
{code}
There seems to be a problem with the "force(true)" call on the FileChannel. I 
don't know why to be honnest (maybe related to the threading model of 
zookeeper).
If I simply remove this line (on the unpatched trunk), 
ClosedByInterruptExceptions disappeared but I observed the same unit test 
regressions in QuorumPeerMainTest as with my second patch.
However when I replaced:
{code:java}
((FileOutputStream) out).getChannel().force(true);
{code}
with:
{code:java}
((FileOutputStream) out).getFD().sync();
{code}
then there was no ClosedByInterruptException in the report, and no unit test 
regression.
And if I apply my patch (the third one, including this modification), there is 
no unit test regression neither (latest Hadoop QA result).

It seems that both fc.force() and fd.sync() have the same effect of flushing 
data on the device:
http://stackoverflow.com/questions/5650327/are-filechannel-force-and-filedescriptor-sync-both-needed

I've tested the ZOOKEEPER-1691 patch in addition to this new one and the 
configBackwardCompatibilityMode scenario. Both are still working.


> dynamic configuration file renaming fails on Windows
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-1835
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1835
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>         Environment: Windows 7 64-bit, Oracle Java  1.6.0_32-b05
>            Reporter: Bruno Freudensprung
>            Assignee: Bruno Freudensprung
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1835.patch, ZOOKEEPER-1835.patch, 
> ZOOKEEPER-1835.patch
>
>
> On Windows, reconfig fails to rename the tmp dynamic config file to the real 
> dynamic config filename.
> Javadoc of java.io.File.renameTo says the behavior is highly plateform 
> dependent, so I guess this should not be a big surprise.
> The problem occurs in 
> src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java that 
> could be modified like this:
> + curFile.delete();
> if (!tmpFile.renameTo(curFile)) {
> + configFile.delete();
> if (!tmpFile.renameTo(configFile)) {
> As suggested by Alex in https://issues.apache.org/jira/browse/ZOOKEEPER-1691 
> (btw there is more information about my test scenario over there) it is a bit 
> "scary" to delete the current configuration file.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to