> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, 
> > line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the 
> > signature of the method unless we throw NoSuchElementException. What if we 
> > return null in the case of an exception?

IOException can be thrown from  

hasNext = itr.next();

which for prefetching the next element. 

I think we only need to set hasNext to false when we see exception, so the 
client won't try to call next() again


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/ZKDatabase.java, line 82
> > <https://reviews.apache.org/r/11231/diff/1/?file=293956#file293956line82>
> >
> >     1/3 seems reasonable but arbitrary. I'm curious to know if there is a 
> > concrete reason behind the choice or if it just feels right for the default.

This require some tuning. I will comment in the JIRA since this is related to 
other diff as well


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, 
> > line 523
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line523>
> >
> >     What if we do the fast forwarding in this method instead of having a 
> > boolean passed to init?

sure, I can refactor it


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, 
> > line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's 
> > necessary and if there is an alternative.

>From learner synchronization perspective, the leader what to send only txns 
>which has zxid > peerLastSeenZxid.  I have to do this either in this layer or 
>in the LearnerHandler. Doing in in this layer is easier to write a unit test 
>and reduce code complexity in LearnerHandler. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, 
> > line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?

This is an existing method, I think we should only rename a method only if its 
semantic has changed or the name is really miss leading.  


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 677
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line677>
> >
> >     I think this means to say that there is no txn for this epoch yet 
> > because the leader of the epoch is just starting, is it right?

Any zxid should point to a valid txn.  However, this zxid point to NEWLEADER 
txn which is never persist to txnlog. If I don't have special handling for this 
case, the leader will think that this learner has a txn which it has never 
seen. So it will send TRUNC message to the learner and cause it to crash. 

Some follower may report that it has this zxid, if it gets a snapshot when the 
quorum is recently formed or get NEWLEADER package as part of outstanding 
proposals.  

You may also noticed that this patch only changes the leader side. This is 
intentional because I need to support rolling upgrade.   


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.

You mean that I should use this? 
http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29
 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 726
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line726>
> >
> >     I'm not sure I understand the second part of this case. Is this 
> > referring to a follower that is starting with a clean state?

See my comment about NEWLEADER zxid above 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 806
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line806>
> >
> >     This comment implies that if peerLastZxid == maxZxid, we still queue 
> > maxZxid, but it is not necessary, right? I have a related comment in one of 
> > the test cases.

If there is a better math notation, let me know. This method won't queue 
maxZxid in this case. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 815
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line815>
> >
> >     suggested name for the method: queueCommittedProposals

no problem


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of 
> > falling into this case is if the follower has tried to join an epoch that 
> > has never become established. 
> >     
> >     Why does the learner crash?

This is also a case when quorum is formed and it has never see any request.  
Then, we restart the quorum.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 
> > 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or 
> > fatal.

Should be WARN or INFO because we just send SNAP whenever we have a case that 
we cannot handle. It is correct but not optimized.  


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java, 
> > line 216
> > <https://reviews.apache.org/r/11231/diff/1/?file=293961#file293961line216>
> >
> >     I'm wondering why we should expect to have a packet in the queue, the 
> > peer is already synced. As I understand this is the case in which 
> > (peerLaxtZxid, maxZxid] = (1, 1].

DIFF packet


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner 
> synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java 
> PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 
> 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 
> 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 
> 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java 
> PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 
> 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java 
> PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>

Reply via email to