[
https://issues.apache.org/jira/browse/ZOOKEEPER-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798219#comment-13798219
]
Akshay Chander commented on ZOOKEEPER-1674:
-------------------------------------------
I am working with Thawan on this feature. I'd appreciate comments and
suggestions for the analysis done so far.
Retaining the database across leader election should improve the recovery time
after leader election. In order to support such a feature, the following
changes will be required to ensure that the existing behavior is maintained.
1) Anything that has reached the PrepRequestProcessor should make it to the
SyncRequestProcessor. Similarly, anything that has reached the commitProcessor
should eventually reach the FinalRequestProcessor. To maintain this invariant:
a) Currently, we drop the database and reload from disk (snapshot + txnlog). We
can effectively mimic this behavior in one of two ways.
i) We retain outstandingProposals and toBeApplied (in the case of leader)
or pendingTxns (in the case of followers) across the leader election.
We will apply the txns in these data structures to the data tree
before calling getInitLastLoggedZxid in lookForLeader()). This will ensure that
the lastSeenZxid sent by the participant during the leader election will remain
the same as before this feature.
ii) Alternatively, we could apply these txns to the data tree during the
shutdown phase. This way, we dont need to do the extra work of persisting these
data structures across leader elections.
b) During shutdown, we should ensure that all appends to the txnlog have
actually been flushed to the disk.
c) By retaining the zkDataBase, we will also be retaining the
sessionsWithTimeouts, which is a listing of global sessions. We need to ensure
that this is now clean after the leader election.
Leader: If there is an upgrade request for a session (from local to
global), we add it to the global session tracker. Since this is going to
persist across leader election, we need to ensure that the txn corresponding to
this createSession is present in atleast the txnlog.
Therefore we need to ensure that requests that are in the
PrepRequestProcessor make their way to the SyncRequestProcessor even if there
is a shutdown at any point in between.
d) Ensure that anything in the FinalRequestProcessor gets applied to the Data
Tree.
2) Don't take a dirty snapshot. We don't want txns that haven't been accepted
by a majority of the quorum to be part of any snapshot. Currently, we take
snapshots on shutdown and in loadData, which we will stop doing.
3) In followers, there is a bug in the local session code. When there is an
upgrade request, we currently remove the session from the local session
tracker and add it to globalSessionWithTimeouts in the local request processor
itself (checkUpgradeSession)
We probably should not add it to the global sessions just yet and let it be
done in the final request processor.
4) Another small bug: In learnerSessionTracker::touchSession, currently if a
session is not in the localSessionTracker and not a global session, then we
return false. this should not be the case any longer.
This is because we may have removed the session from the local session
tracker for an upgrade request. So just add it to the touchTable and return
true.
This analysis was done on our internal branch which is based of 3.4. Therefore,
we haven't investigated how this feature would be affected by the Dynamic
Reconfiguration feature.
> There is no need to clear & load the database across leader election
> --------------------------------------------------------------------
>
> Key: ZOOKEEPER-1674
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1674
> Project: ZooKeeper
> Issue Type: Improvement
> Reporter: Jacky007
>
> It is interesting to notice the piece of codes in QuorumPeer.java
> /* ZKDatabase is a top level member of quorumpeer
> * which will be used in all the zookeeperservers
> * instantiated later. Also, it is created once on
> * bootup and only thrown away in case of a truncate
> * message from the leader
> */
> private ZKDatabase zkDb;
> It is introduced by ZOOKEEPER-596. Now, we just drop the database every
> leader election.
> We can keep it safely with ZOOKEEPER-1549.
--
This message was sent by Atlassian JIRA
(v6.1#6144)