[
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614992#comment-13614992
]
Alexander Shraer commented on ZOOKEEPER-1629:
---------------------------------------------
After some debugging, here's what seems to be the problem.
There were some timing related failures, which the attached patch mostly
addresses, revealing a different problem.
The failure above is that zk1 sees znode /test2 but zk2 doesn't.
>From the log:
2013-03-27 02:39:59,438 [myid:] - INFO [main:TruncateCorruptionTest@160] -
List of children at zk2 before zk1 became master
2013-03-27 02:39:59,440 [myid:] - INFO [main:TruncateCorruptionTest@162] -
[test, zookeeper, test3]
2013-03-27 02:39:59,440 [myid:] - INFO [main:TruncateCorruptionTest@164] -
List of children at zk1 before zk1 became master
2013-03-27 02:39:59,442 [myid:] - INFO [main:TruncateCorruptionTest@166] -
[test, zookeeper, test2, test3]
The test is designed in a way that /test2 is first committed to servers 1 and
3, but then the test deletes the data dir of server 3,
disconnects server 1, has server 3 form a quorum with server 2 and when server
1 connects to the new ensemble it is being forced
to truncate the committed transaction that created /test2. So why does it still
have /test2 in its data tree ? This is because earlier it managed to make a
snapshot (39:17). Truncate doesn't touch the snapshot. After the truncate when
we load the database, we first start from the snapshot, then apply the
truncated log. So /test2 showing up is perfectly OK in this case.
If we want to keep the current structure of this test, we should disable
snapshotting for its duration. Is there a way to do that ?
> testTrancationLogCorruption occasionally fails
> ----------------------------------------------
>
> Key: ZOOKEEPER-1629
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Reporter: Alexander Shraer
> Attachments: TruncateCorruptionTest-patch.patch
>
>
> It seems that testTransactionLogCorruption is very flaky,for example fails
> here:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
> also fails for older builds (no longer on the website), for example all
> builds from 381 to 399.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira