[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694114#comment-13694114
 ] 

Vinay commented on BOOKKEEPER-636:
----------------------------------

Adding analysis to avoid very big description.


Analysis:
------------
1. EveryTime new journal txn log will be created with the nanoTime() as the id.
{code}          if (null == logFile) {
                    logId = MathUtils.now();
                    logFile = new JournalChannel(journalDirectory, logId);{code}

2. System.nanoTime() is the relative time with the system startup. It will 
start from 0 everytime system re-started. refer 
http://java.dzone.com/articles/how-arbitary-start


3. before reboot, last txn logId was very huge, but just after restart, latest 
txn id became *40974.txn*. 1c9d2f436.txn  was the last txn file.
 As part of first roll latest txn file was garbage collected.
{noformat}2013-06-25 14:38:14,730 - INFO  [SyncThread:Journal@341] - garbage 
collected journal 40974.txn{noformat}

4. After this one more restart gave the problem.


*Conclusion*:
We should not use {{System.nanoTime()}} for the storage and compare purposes.
                
> Latest txn logs might be deleted in a race condition which is not recoverable 
> if BK goes down before next txn log created.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-636
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-636
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.1, 4.3.0
>            Reporter: Vinay
>            Assignee: Vinay
>            Priority: Blocker
>
> With the following scenario latest transaction log can be deleted.
> 1. more than {{journalMaxBackups}} txn logs are there in journal dir.
> 2. BK machine was up for long time and the latest txn log id is some what 
> huge number
> 3. Now reboot the machine.
> 4. after reboot BK restarted.
> 5. Now, Immediately after startup, One entry is added, due to which Synthread 
> rolled the lastMark in ledger dirs before the lastLogId updated by Journal 
> thread. (this lastMark was having the old logId which was before reboot). 
> 6. Now after roll, old journal txn logs were gc'ed. *Now latest created the 
> txn log was deleted.*
> 7. After this Journal thread updated the lastLogMark, also some more rolls 
> happened.
> 8. Now BK restarted again. But BK was not able to start because it was not 
> able to find the latest txn log file in journal directory.
> {noformat}java.io.IOException: Recovery log 264564 is missing
>         at org.apache.bookkeeper.bookie.Journal.replay(Journal.java:424)
>         at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:547)
>         at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:603)
>         at 
> org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:111){noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to