[
https://issues.apache.org/jira/browse/ZOOKEEPER-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774327#comment-13774327
]
Germán Blanco commented on ZOOKEEPER-87:
----------------------------------------
Thanks to you for taking a look, Flavio!
If it is idle, then it doesn't do anything. The implementation is a class that
has 4 methods:
- start() : Invoked from LearnerHandler thread.
- updateProposal() : Invoked from the thread that sends packets. If process has
started, and there is no proposal, it starts monitoring that proposal. If there
is a proposal already, it updates the "next proposal" to monitor.
- updateAck() : Invoked from the thread that receives packets. If the ACK
corresponds to the current proposal, then it removes the current proposal and
if there is a "next proposal" it starts using that one.
- check() : Invoked from the Leader thread. It returns true if everything is
ok, false if there is a timeout.
It has the following states:
- NOT_STARTED: It doesn't do anything. Check will always return true. It can
only transition to WAITING_FOR_PROPOSAL.
- WAITING_FOR_PROPOSAL: It expects a new proposal. Check will always return
true. As soon as one proposal arrives it will start monitoring it and
transition to MONITORING_PROPOSAL.
- MONITORING_PROPOSAL: It waits for the ACK of the proposal under monitoring.
Check will return true unless there is a timeout for the current proposal. It
continuously update the "next proposal" with incoming proposals. If the ACK for
the current proposal arrives, it will either update the proposal with the "next
proposal" or transition to WAITING_FOR_PROPOSAL if there is no "next proposal".
The monitoring is started when the synchronization phase ends (and the timeout
in the socket is updated to syncLimit*tickTime). The processing required in
each of the calls to the four functions is minimal, so there should be no
disturbance of performance at all.
Since proposals are executed in order, with this checking the longest time
between a proposal and its ACK will be if there is a proposal that comes
immediately after the one that is being monitored, and its ACK arrives
immediately before the ACK for the "next proposal" after that. That would mean
(2 * syncLimit) * tickTime, and then since we check the timeout every 1/2
tickTime, that needs to be added so it is (2'5 * syncLimit) * tickTime. In
usual operation, ACKs will take more or less the same for every proposal and
the timeout will jump more or less as soon as any of the ACKs takes longer than
(syncLimit * tickTime).
Or at least that is how I intend it to work :-).
... and I don't mind to turn this upside down for anything else that covers the
requirement, so I hope that we won't have a problem in converging on a patch
... we'll see.
Sorry about deleting the patches. It made me unconfortable to leave there
something that didn't work. But I will keep in mind that way of working from
now on.
> Follower does not shut itself down if its too far behind the leader.
> --------------------------------------------------------------------
>
> Key: ZOOKEEPER-87
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-87
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.0, 3.4.5
> Reporter: Mahadev konar
> Assignee: Germán Blanco
> Priority: Critical
> Labels: patch
> Fix For: 3.5.0, 3.4.6
>
> Attachments: ZOOKEEPER-87_3.4.patch, ZOOKEEPER-87.patch
>
>
> Currently, the follower if lagging behind keeps sending pings to the leader
> it will stay alive and will keep getting further and further behind the
> leader. The follower should shut itself down if it is not able to keep up to
> the leader within some limit so that gurantee of updates can be made to the
> clients connected to different servers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira