[
https://issues.apache.org/jira/browse/ZOOKEEPER-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norbert Kalmár resolved ZOOKEEPER-3911.
---------------------------------------
Resolution: Fixed
> Data inconsistency caused by DIFF sync uncommitted log
> ------------------------------------------------------
>
> Key: ZOOKEEPER-3911
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3911
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.5.4, 3.6.0, 3.4.12, 3.4.13, 3.5.5, 3.5.6, 3.5.7,
> 3.6.1, 3.5.8
> Reporter: lixun
> Assignee: Michael Han
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.7.0, 3.5.9, 3.6.3
>
> Attachments: example.png
>
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> Since version 3.4, the quorum of followers and the leader did not synchronize
> the files immediately when the synchronization was completed, and the data
> was not persisted to the files in an instant, and at this time the zk server
> can provide external access, such as webapp access, if it appears at this
> time Failure, phantom reading may occur
> There is a example in the link. [ here
> example|[https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing]]
> -----------------mail list-----------------
> mail response from [email protected]
> Hi Xun,
> I think this is a bug, your test case is sound to me. Do you mind
> creating a JIRA for this issue?
> Followers should not ACK NEWLEADER without ACK every transaction from the
> DIFF sync. To ACK every transaction, a follower either persists the
> transaction in log, or takes a snapshot before sending the ACK of the
> NEWLEADER (which we did, before ZOOKEEPER-2678 where the snapshot
> optimization was introduced).
> A potential fix I have in mind is to make sure to persist all DIFF sync
> proposals from LEADER (similar to what we are already doing for proposals
> coming between NEWLEADER and UPTODATE). By doing so, when the leader
> receives NEWLEADER ACK from a quorum, it's guaranteed that
> every transaction leader DIFF sync to follower is quorum committed. Thus
> there will not be inconsistent views moving forward. Alternatively we can
> take a snapshot before ACK NEWLEADER but that will be a big performance hit
> for big data trees.
> I am also interested to hear what others think about this.
> On Fri, Aug 28, 2020 at 12:20 AM li xun <[email protected]> wrote:
>
> {quote}There is a example in the link, would you understand what I mean?
> [https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing]
> Since version 3.4, the quorum of followers and the leader did not
> synchronize the files immediately when the synchronization was completed,
> and the data was not persisted to the files in an instant, and at this time
> the zk server can provide external access, such as webapp access, if it
> appears at this time Failure, phantom reading may occur
> {quote}
> 2020年8月28日 14:51,Justin Ling Mao <[email protected]> 写道:
>
> @李珣The situation you describe may have conceptual deviations about how
> {quote}the consensus protocol works:---> Since the data of the follower when
> the
> follower uses the DIFF method to synchronize with the leader is still in
> the memory, it has not had time to persist1. The write path is: write
> transaction log(WAL) firstly, after reaching a consensus, then apply to
> memory, other than the opposite.
> {quote}
> ---> but at this time, the latest zxid_n of the leader has not been
> {quote}supported by the quorum of the follower. At this time, if a client
> connects
> to the leader and sees zxid_n,2. If a write has not been supported by the
> quorum, it's not safe to apply to the state machine and the client is not
> able to see this write.
> {quote}
> I guess that your question may be: how the system handles the
> {quote}uncommitted logs when leader changes?
> {quote}
>
> ----- Original Message -----
> From: Ted Dunning <[email protected]>
> To: [email protected]
> Subject: Re: May violate the ZAB agreement – version 3.6.1
> Date: 2020-08-28 01:25
> How is it that participant A would have a later zxid than the leader?
> In particular, it seems to me that it should be impossible to have these
> two facts be true:
> 1) a transaction has been committed with zxid = z_0. This implies that a
> quorum of the cluster has accepted this transaction and it has been
> committed.
> 2) a new leader election nominates a leader with latest zxid < z_0.
> My reasoning is that any new leader election has to involve a quorum and
> {quote}at
> {quote}
> least a sufficient number of that quorum must have accepted zxid >= z_0
> {quote}and
> {quote}
> therefore would refuse to be part of the quorum (this is a
> {quote}contradiction).
> {quote}
> Thus, no leader could be elected with zxid < z_0 if fact (1) is true.
> What you are describing seems to require both of these facts.
> Perhaps I am missing something about your suggested scenario. Could you
> describe what you are thinking in more detail?
> On Thu, Aug 27, 2020 at 2:08 AM 李珣 <[email protected]> wrote:
>
> {quote}version 3.6.1
> org.apache.zookeeper.server.quorum.Learner.java line:605
> Suppose there is a situation
> zxid_n is the largest zxid of Participant A (the leader has just resumed
> from downtime). Zxid_n has not been recognized by the quorum. Assuming
> Participant A is elected as the Leader, then if a follower appears to
> {quote}
> {quote}use
> {quote}
> {quote}DIFF to synchronize data with the Leader, Leader After sending the
> UPTODATE, the leader can already provide external access, but at this
> {quote}
> {quote}time,
> {quote}
> {quote}the latest zxid_n of the leader has not been supported by the quorum of
> {quote}
> {quote}the
> {quote}
> {quote}follower. At this time, if a client connects to the leader and sees
> {quote}
> {quote}zxid_n,
> {quote}
> {quote}then at this time both the leader and the follower are down. For some
> reason, the leader cannot be started, and the follower can start
> {quote}
> {quote}normally.
> {quote}
> {quote}At this time, a new leader can only be elected from the follower. Since
> {quote}
> {quote}the
> {quote}
> {quote}data of the follower when the follower uses the DIFF method to
> {quote}
> {quote}synchronize
> {quote}
> {quote}with the leader is still in the memory, it has not had time to persist,
> then this The newly elected leader does not have the data of zxid_n, but
> before zxid_n has been seen by the client on the old leader, there will
> {quote}
> {quote}be
> {quote}
> {quote}inconsistencies in the data view.
> Is the above situation possible?
> {quote}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)