[
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620009#comment-13620009
]
Alexander Shraer commented on ZOOKEEPER-1629:
---------------------------------------------
I think the snapshotting is a problem by looking at the log above:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1439//testReport/org.apache.zookeeper.server/TruncateCorruptionTest/testTransactionLogCorruption/
zk1 still sees /test2 (39:59), even after its log is truncated (39:57). This is
because earlier zk1 managed to make a snapshot (39:17). I think the explanation
is this: truncate doesn't touch the snapshot. After the truncate when we load
the database, we first start from the snapshot, then apply the truncated log.
So /test2 is showing up, and we have an inconsistent state between the servers
(only zk1 sees /test2).
The reason why I think this is not a ZooKeeper bug but a bug of this test is
that the test explicitly erases the state of 2 out of 3 servers in order to
cause truncate on the log of zk1. This situation is not something that was
considered possible in ZooKeeper (for good reasons), and it doesn't work as can
be seen from the log.
I suggest to first figure out if this is something we can fix. If not, there is
no point in explaining this test better :)
> testTransactionLogCorruption occasionally fails
> -----------------------------------------------
>
> Key: ZOOKEEPER-1629
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Reporter: Alexander Shraer
> Attachments: TruncateCorruptionTest-patch.patch
>
>
> It seems that testTransactionLogCorruption is very flaky,for example fails
> here:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
> also fails for older builds (no longer on the website), for example all
> builds from 381 to 399.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira