[
https://issues.apache.org/jira/browse/ZOOKEEPER-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673574#comment-13673574
]
Flavio Junqueira commented on ZOOKEEPER-1413:
---------------------------------------------
I have tried this patch with the test case of ZOOKEEPER-876 and the test fails.
Here are a couple of lines in the output log:
{noformat}
2013-06-03 22:42:25,396 [myid:] - INFO
[LearnerHandler-/127.0.0.1:61449:LearnerHandler@583] - Synchronizing with
Follower sid: 3 maxCommittedLog=0x100000005 minCommittedLog=0x100000001
lastProcessedZxid=0x100000005 peerLastZxid=0x0
2013-06-03 22:42:25,396 [myid:] - INFO
[LearnerHandler-/127.0.0.1:61450:LearnerHandler@583] - Synchronizing with
Follower sid: 4 maxCommittedLog=0x100000005 minCommittedLog=0x100000001
lastProcessedZxid=0x100000005 peerLastZxid=0x100000005
{noformat}
There is a server with id 5 that is he leader and 5 ends up sending a snapshot
to server 3 although the committedLog contains all txns it needs:
{noformat}
2013-06-03 22:42:25,401 [myid:] - INFO
[LearnerHandler-/127.0.0.1:61449:LearnerHandler@368] - Sending snapshot last
zxid of peer is 0x0 zxid of leader is 0x200000000 sent zxid of db as 0x100000005
{noformat}
and this message seems to be related:
{noformat}
2013-06-03 22:42:25,398 [myid:] - WARN
[LearnerHandler-/127.0.0.1:61449:ZKDatabase@307] - Unable to find proposals
from txnlog for zxid: 0
{noformat}
> Use on-disk transaction log for learner sync up
> -----------------------------------------------
>
> Key: ZOOKEEPER-1413
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.4.3
> Reporter: Thawan Kooburat
> Assignee: Thawan Kooburat
> Priority: Minor
> Labels: performance
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1413.patch, ZOOKEEPER-1413.patch,
> ZOOKEEPER-1413.patch
>
>
> Motivation:
> The learner syncs up with leader by retrieving committed log from the leader.
> Currently, the leader only keeps 500 entries of recently committed log in
> memory. If the learner falls behind more than 500 updates, the leader will
> send the entire snapshot to the learner.
> With the size of the snapshot for some of our Zookeeper deployments (~10G),
> it is prohibitively expensive to send the entire snapshot over network.
> Additionally, our Zookeeper may serve more than 4K updates per seconds. As a
> result, a network hiccups for less than a second will cause the learner to
> use snapshot transfer.
> Design:
> Instead of looking only at committed log in memory, the leader will also look
> at transaction log on disk. The amount of transaction log kept on disk is
> configurable and the current default is 100k. This will allow Zookeeper to
> tolerate longer temporal network failure before initiating the snapshot
> transfer.
> Implementation:
> We plan to add interface to the persistence layer will can be use to retrieve
> proposals from on-disk transaction log. These proposals can then be used to
> send to the learner using existing protocol.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira