[
https://issues.apache.org/jira/browse/BOOKKEEPER-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778828#comment-13778828
]
Ivan Kelly commented on BOOKKEEPER-685:
---------------------------------------
[~rakeshr] The case you describe is pretty much the case in the JIRA
description. 1 single entry can be lost in the scenario.
[~hustlmsp] set to false after flush will work. It will also create the
possibility that the gc thread hangs it onRotateEntryLog sets flushed to true,
between the addEntry and the setFalse, assuming no more entries are added. This
is not a big problem, as there should be more entries added. There's a very
very small case that the universe could conspire to make it so that we cannot
add more entries to the entrylog without gc getting rid of a different
entrylog, and if gc is waiting on a flush that never happens we have a deadlock.
{quote}
for example, suppose GCThread is compacting entry log file X, which adding N
entries in entry log file Y and adding M entries in entry log file (Y+1). the
only benefit is that you could flush N entries using offset when entry logger
rotated from Y to Y+1. but, you still could not flush M entries until entry
logger rotated from Y+1 to Y+2, which means you also could not delete entry log
file X at this point, right? so if bookie crashed at this point, bookie will
still need to compact entry log file X again, both N and M entries. the new
flushed N entries don't help anything.{quote}
So, in 4.2.2, flushing is not tied to rotation to start with. Flushing occurs
at a regular interval if there is data to be flushed. So if N+M entries have
been added to the entrylog, the sync thread will eventually flush them. The
case you describe wont happen, because if N+M entries have been added to logs Y
& Y+1, the entrylog _will_ flush them. I think this will be clearer in code.
I'll make a patch.
> Race in compaction algorithm from BOOKKEEPER-664
> ------------------------------------------------
>
> Key: BOOKKEEPER-685
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-685
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Ivan Kelly
> Priority: Blocker
> Fix For: 4.2.2
>
>
> I discovered a race in the algorithm when I was forward porting to trunk.
> 1) Thread1: flushed.set(false)
> 2) Thread2: onRotateEntryLog() // flushed.set(true)
> 3) Thread1: entryLogger addEntry L123-E456
> 4) Thread1: offsets > max, waits for flushed, flushed is true(as set in 2),
> L123-E456 updated in ledger cache
> 5) T2: L123 flushed out of ledger cache
> 6) Crash
> This will possible lose 1 entry. I've only reasoned this, not observed it,
> but it can happen.
> The fix is pretty easy. EntryLoggerListener should notify with the point
> offset in the entry log it has synced as far as.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira