[jira] [Commented] (BOOKKEEPER-685) Race in compaction algorithm from BOOKKEEPER-664

Ivan Kelly (JIRA) Thu, 26 Sep 2013 07:41:35 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778828#comment-13778828
 ]


Ivan Kelly commented on BOOKKEEPER-685:
---------------------------------------

[~rakeshr] The case you describe is pretty much the case in the JIRA 
description. 1 single entry can be lost in the scenario.

[~hustlmsp] set to false after flush will work. It will also create the 
possibility that the gc thread hangs it onRotateEntryLog sets flushed to true, 
between the addEntry and the setFalse, assuming no more entries are added. This 
is not a big problem, as there should be more entries added. There's a very 
very small case that the universe could conspire to make it so that we cannot 
add more entries to the entrylog without gc getting rid of a different 
entrylog, and if gc is waiting on a flush that never happens we have a deadlock.

{quote}
for example, suppose GCThread is compacting entry log file X, which adding N 
entries in entry log file Y and adding M entries in entry log file (Y+1). the 
only benefit is that you could flush N entries using offset when entry logger 
rotated from Y to Y+1. but, you still could not flush M entries until entry 
logger rotated from Y+1 to Y+2, which means you also could not delete entry log 
file X at this point, right? so if bookie crashed at this point, bookie will 
still need to compact entry log file X again, both N and M entries. the new 
flushed N entries don't help anything.{quote}
So, in 4.2.2, flushing is not tied to rotation to start with. Flushing occurs 
at a regular interval if there is data to be flushed. So if N+M entries have 
been added to the entrylog, the sync thread will eventually flush them. The 
case you describe wont happen, because if N+M entries have been added to logs Y 
& Y+1, the entrylog _will_ flush them. I think this will be clearer in code. 
I'll make a patch.
                
> Race in compaction algorithm from BOOKKEEPER-664
> ------------------------------------------------
>
>                 Key: BOOKKEEPER-685
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-685
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.2.2
>
>
> I discovered a race in the algorithm when I was forward porting to trunk.
> 1) Thread1: flushed.set(false)
> 2) Thread2: onRotateEntryLog() // flushed.set(true)
> 3) Thread1: entryLogger addEntry L123-E456
> 4) Thread1: offsets > max, waits for flushed, flushed is true(as set in 2), 
> L123-E456 updated in ledger cache
> 5) T2: L123 flushed out of ledger cache
> 6) Crash
> This will possible lose 1 entry. I've only reasoned this, not observed it, 
> but it can happen.
> The fix is pretty easy. EntryLoggerListener should notify with the point 
> offset in the entry log it has synced as far as. 
>       

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-685) Race in compaction algorithm from BOOKKEEPER-664

Reply via email to