jonmv commented on PR #1925: URL: https://github.com/apache/zookeeper/pull/1925#issuecomment-1263579500
We've had this running for almost a week now, without any issues, and the data inconsistencies have not been observed. The sample size isn't large enough to conclude, though :) Anyway, we saw some other digest mismatches, and I started digging around for their cause. I found one problem introduced with [this commit](https://github.com/apache/zookeeper/commit/b978dfb949e4ac4d703e956c6ef811415c831bcd), fixed in [8121711](https://github.com/apache/zookeeper/pull/1925/commits/81217117c272b91cd7a91d06568adf6b53047801). The problem was the a `COMMIT` between `NEWLEADER` (which flushes the `packetsNotCommited`) and `UPTODATE` would crash the learner, which would peek at this queue and expect entries in it. This is fixed by passing on the entries, but not removing them; instead, the already written entries are simply skipped when updating the log after `UPTODATE`. Working on the above, I also found the fix for reconfig between `NEWLEADER` and `UPTODATE`, in [this commit](https://github.com/apache/zookeeper/commit/c38787f355b6dcd612fc57db0202fc68a01108f7), to be incomplete: since the `packetsNotCommited` is no longer emptied after `NEWLEADER`, the head doesn't change, and if there are other `PROPOSAL`s between the `NEWLEADER` and the `PROPOSAL` that the `COMMITANDACTIVATE` is meant for, then reconfig still doesn't happen. The unit test added back then was insufficient to test this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org