jonmv commented on PR #1925:
URL: https://github.com/apache/zookeeper/pull/1925#issuecomment-1263579500

   We've had this running for almost a week now, without any issues, and the 
data inconsistencies have not been observed. The sample size isn't large enough 
to conclude, though :)
   
   
   Anyway, we saw some other digest mismatches, and I started digging around 
for their cause. I found one problem introduced with [this 
commit](https://github.com/apache/zookeeper/commit/b978dfb949e4ac4d703e956c6ef811415c831bcd),
 fixed in 
[8121711](https://github.com/apache/zookeeper/pull/1925/commits/81217117c272b91cd7a91d06568adf6b53047801).
   The problem was the a `COMMIT` between `NEWLEADER` (which flushes the 
`packetsNotCommited`) and `UPTODATE` would crash the learner, which would peek 
at this queue and expect entries in it. This is fixed by passing on the 
entries, but not removing them; instead, the already written entries are simply 
skipped when updating the log after `UPTODATE`. 
   
   Working on the above, I also found the fix for reconfig between `NEWLEADER` 
and `UPTODATE`, in [this 
commit](https://github.com/apache/zookeeper/commit/c38787f355b6dcd612fc57db0202fc68a01108f7),
 to be incomplete: since the `packetsNotCommited` is no longer emptied after 
`NEWLEADER`, the head doesn't change, and if there are other `PROPOSAL`s 
between the `NEWLEADER` and the `PROPOSAL` that the `COMMITANDACTIVATE` is 
meant for, then reconfig still doesn't happen. The unit test added back then 
was insufficient to test this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to